-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DWC2 DMA support #2576
DWC2 DMA support #2576
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
9cafcdd
to
97251e4
Compare
@@ -67,18 +67,6 @@ | |||
// Debug level for DWC2 | |||
#define DWC2_DEBUG 2 | |||
|
|||
#ifndef dcache_clean | |||
#define dcache_clean(_addr, _size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dcache code should be removed and add MPU suggestion to make usb ram non cacheable:
- Not all internal buffers are aligned to cache line size and has size of multiple cache line size, speculative read or cache eviction of adjacent variables could lead to racing with DMA.
- Frequent cache clean invalidate hurt performance
- In most cases Cortex-M7 recommend use DTCM where cache is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely sure with this case, as H743 and probably most of stm32 m7 h7 USB DMA controller don't have access to DTCM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for stm32f7xx, those all seem to have the AHB-S port of the M7 connected directly to the bus matrix, so it is accessible by the USB DMA in this case. Though, that is only for DTCM, ITCM is still inaccessible by anything besides the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I haven't check f7, maybe this is specific to h7 only
static inline bool dma_enabled(uint8_t rhport) | ||
{ | ||
// DMA doesn't support fifo transfer | ||
#ifdef TUD_AUDIO_PREFER_RING_BUFFER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want to depend on class driver but maybe it's the simplest way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFG_TUD_DWC2_DMA must be enabled at compile time, audio driver/user should adapt accordingly.
@@ -351,6 +382,12 @@ static void bus_reset(uint8_t rhport) { | |||
// Setup the control endpoint 0 | |||
_allocated_fifo_words_tx = 16; | |||
|
|||
// DMA needs extra space for processing | |||
if(dma_enabled(rhport)) { | |||
uint16_t reserved = _dwc2_controller[rhport].ep_fifo_size / 4- dwc2->ghwcfg3_bm.total_fifo_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel use dwc2->ghwcfg3_bm.total_fifo_size
it should be reliable.
We can even remove _dwc2_controller[rhport].ep_fifo_size
and use this field instead, except GD32VF103 looks strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is the max epinfo where scatt/gather dma is used. We can use gdfifocfg to dynamic change the epinfo base to reduce the DMA region for simple DMA mode.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Hi @HiFiPhile , |
thank you very much @HiFiPhile again for another great PR. sorry for super late response, I am doing more test with this, seems like it have some issue with my h743 eval which uses external HS PHY. That is ok, I am doing more testing and reading hopefully we could fix and merge this soon. |
Thanks for testing, it's the combination I'm missing.
@roma-jam In short I'm using GPIO toggle in ISR and in event queue processing, then take the averaged sum of 10 cycles. |
…d explanation for EPInfo address and GDFIFO. some function rename update h743 linker to use SRAM1 since USB DMA cannot access DTCM ram update xmc4500 to use uuid for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really long review with incorrect poking. It turns out H743 USB DMA cannot access DTCM ram. Though that force me to do lots of reading and have a better understand of the dwc2 register for EPInfo for DMA. Which store 1 words for normal DMA, and 4 words for scatter/gather. We can dynamically configure gdififocfg with epinfo base to save footprint. Since currently we does not support scatter/gather, we only need 2xep_count.
While troubleshooting, I also rewrite to simplify the fifo allocation scheme and update documentation as well since we don't need to close/re-open iso anymore.
Since DMA cannot dynamically switched on/off per endpoint, I have made ths an compiled timed option, which is default to off CFG_TUD_DWC2_DMA
. The reason is, in the future with non-dma mode, we can actually get rid of local ep_buffer in the driver e.g cd (tu_fifo -->dwc2 dfifo). Which I have been thinkging but haven't done yet. So yeah DMA will require more memory for ep_buf (to live long enough for transfer to complete as well as word aligned like other ports). Other driver such audio should adapt with the new cfg option.
from Linux kernel driver
* Unfortunately the choice to use DMA or not is global to the controller
* and seems to be only settable when the controller is being put through
* a core reset. This means we either need to fix the gadgets to take
* account of DMA alignment, or add bounce buffers (yuerk).
PS: I have tested this with esp32s3, xmc4500, stm32 h743, f412, u5a5 and they works flawlessly. Thank you @HiFiPhile again for another great PR.
@HiFiPhile let me know if you spot any issue with changes that I made, otherwise we could merge this now :) (finally) |
Oh it's a little pity for performance. I think we need to update the BSP with MPU config to make the region non-cacheable. For I'll test with my boards if everything works. |
Maybe just leave it as it is if it isn't much of an issue. I prefer to have generic/simpler code to maintain :)
From my understanding, it is the max if we use scatter/gather DMA mode. I think the parameter is pre-calculated by synopsys ip configuration tool based on the number of endpoints. However, we can dynamic assign gdfifocfg if we only use the buffer/normal DMA mode. |
Just tested on STM32F407, STM32F723 and STM32U5A5, didn't redo the full matrix but all major combinations work. |
thank you @HiFiPhile for retesting, I think it is perfect to merge now. I will do more follow-up clean up to opt out code when DMA is used later on e.g remove the xfer fifo and tu_fifo const etc.. |
Describe the PR
Add Internal DMA support to DWC2.
Status
cdc_ual_ports
audio_4_channel_mic
video_capture_2ch
FS w/wo DMA
FS w/wo DMA
FS + HS w/wo DMA
HS w/wo DMA
cdc_msc_freertos
audio_4_channel_mic_freertos
hid_composite_freertos
FS w/wo DMA
Benchmark
STM32F723E-DISCO
IAR 9.50 High-Balanced
I-Cache enabled
Method used in #920, all buffers set to 2048b.