Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DWC2 DMA support #2576

Merged
merged 14 commits into from
Sep 25, 2024
Merged

DWC2 DMA support #2576

merged 14 commits into from
Sep 25, 2024

Conversation

HiFiPhile
Copy link
Collaborator

@HiFiPhile HiFiPhile commented Apr 5, 2024

Describe the PR
Add Internal DMA support to DWC2.

Status

Model Core Rev Enumeration cdc_ual_ports audio_4_channel_mic video_capture_2ch
STM32F407
FS w/wo DMA
2.81a ✔️ ✔️ ✔️ ✔️
STM32F429
FS w/wo DMA
2.81a ✔️ ✔️ ✔️ ✔️
STM32F723
FS + HS w/wo DMA
3.30a ✔️ ✔️ ✔️ ✔️
STM32U5
HS w/wo DMA
4.11a ✔️ ✔️ ✔️ ✔️
Model Core Rev Enumeration cdc_msc_freertos audio_4_channel_mic_freertos hid_composite_freertos
ESP32-S3
FS w/wo DMA
4.00a ✔️ ✔️ ✔️ ✔️

Benchmark

STM32F723E-DISCO
IAR 9.50 High-Balanced
I-Cache enabled

Method used in #920, all buffers set to 2048b.

Throughput Total CPU Usage
DMA OFF 26.7 MB/s 55%
DMA ON 29.5 MB/s 26%

@HiFiPhile

This comment was marked as resolved.

@@ -67,18 +67,6 @@
// Debug level for DWC2
#define DWC2_DEBUG 2

#ifndef dcache_clean
#define dcache_clean(_addr, _size)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dcache code should be removed and add MPU suggestion to make usb ram non cacheable:

  • Not all internal buffers are aligned to cache line size and has size of multiple cache line size, speculative read or cache eviction of adjacent variables could lead to racing with DMA.
  • Frequent cache clean invalidate hurt performance
  • In most cases Cortex-M7 recommend use DTCM where cache is not used

Copy link
Owner

@hathach hathach Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely sure with this case, as H743 and probably most of stm32 m7 h7 USB DMA controller don't have access to DTCM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for stm32f7xx, those all seem to have the AHB-S port of the M7 connected directly to the bus matrix, so it is accessible by the USB DMA in this case. Though, that is only for DTCM, ITCM is still inaccessible by anything besides the CPU.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I haven't check f7, maybe this is specific to h7 only

static inline bool dma_enabled(uint8_t rhport)
{
// DMA doesn't support fifo transfer
#ifdef TUD_AUDIO_PREFER_RING_BUFFER
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to depend on class driver but maybe it's the simplest way.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFG_TUD_DWC2_DMA must be enabled at compile time, audio driver/user should adapt accordingly.

@@ -351,6 +382,12 @@ static void bus_reset(uint8_t rhport) {
// Setup the control endpoint 0
_allocated_fifo_words_tx = 16;

// DMA needs extra space for processing
if(dma_enabled(rhport)) {
uint16_t reserved = _dwc2_controller[rhport].ep_fifo_size / 4- dwc2->ghwcfg3_bm.total_fifo_size;
Copy link
Collaborator Author

@HiFiPhile HiFiPhile Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel use dwc2->ghwcfg3_bm.total_fifo_size it should be reliable.
We can even remove _dwc2_controller[rhport].ep_fifo_size and use this field instead, except GD32VF103 looks strange.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is the max epinfo where scatt/gather dma is used. We can use gdfifocfg to dynamic change the epinfo base to reduce the DMA region for simple DMA mode.

@HiFiPhile HiFiPhile changed the title WIP: DWC2 DMA support DWC2 DMA support Apr 25, 2024
@HiFiPhile HiFiPhile marked this pull request as ready for review April 25, 2024 22:15
@leptun

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@roma-jam
Copy link
Contributor

Hi @HiFiPhile ,
May I ask how did you measure the total CPU load?

@hathach hathach self-requested a review September 18, 2024 10:59
@hathach
Copy link
Owner

hathach commented Sep 18, 2024

thank you very much @HiFiPhile again for another great PR. sorry for super late response, I am doing more test with this, seems like it have some issue with my h743 eval which uses external HS PHY. That is ok, I am doing more testing and reading hopefully we could fix and merge this soon.

@HiFiPhile
Copy link
Collaborator Author

HiFiPhile commented Sep 18, 2024

seems like it have some issue with my h743 eval which uses external HS PHY

Thanks for testing, it's the combination I'm missing.

May I ask how did you measure the total CPU load?

@roma-jam In short I'm using GPIO toggle in ISR and in event queue processing, then take the averaged sum of 10 cycles.

@hathach
Copy link
Owner

hathach commented Sep 24, 2024

found the issue, h743 usb dma cannot access DTCM, change linker to use SRAM1 and it is all good. for this reason, it is also required to have CFG_TUD_MEM_SECTION for dma as well. I will do more clean up and we can merge this real soon.

image

…d explanation for EPInfo address and GDFIFO.

some function rename
update h743 linker to use SRAM1 since USB DMA cannot access DTCM ram
update xmc4500 to use uuid for testing
}

return true;
}

static void dfifo_init(uint8_t rhport) {

Check notice

Code scanning / CodeQL

Unused static function Note

Static function dfifo_init is unreachable
Copy link
Owner

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really long review with incorrect poking. It turns out H743 USB DMA cannot access DTCM ram. Though that force me to do lots of reading and have a better understand of the dwc2 register for EPInfo for DMA. Which store 1 words for normal DMA, and 4 words for scatter/gather. We can dynamically configure gdififocfg with epinfo base to save footprint. Since currently we does not support scatter/gather, we only need 2xep_count.

While troubleshooting, I also rewrite to simplify the fifo allocation scheme and update documentation as well since we don't need to close/re-open iso anymore.

Since DMA cannot dynamically switched on/off per endpoint, I have made ths an compiled timed option, which is default to off CFG_TUD_DWC2_DMA. The reason is, in the future with non-dma mode, we can actually get rid of local ep_buffer in the driver e.g cd (tu_fifo -->dwc2 dfifo). Which I have been thinkging but haven't done yet. So yeah DMA will require more memory for ep_buf (to live long enough for transfer to complete as well as word aligned like other ports). Other driver such audio should adapt with the new cfg option.

from Linux kernel driver
 * Unfortunately the choice to use DMA or not is global to the controller
 * and seems to be only settable when the controller is being put through
 * a core reset. This means we either need to fix the gadgets to take
 * account of DMA alignment, or add bounce buffers (yuerk).

PS: I have tested this with esp32s3, xmc4500, stm32 h743, f412, u5a5 and they works flawlessly. Thank you @HiFiPhile again for another great PR.

@hathach
Copy link
Owner

hathach commented Sep 24, 2024

@HiFiPhile let me know if you spot any issue with changes that I made, otherwise we could merge this now :) (finally)

@HiFiPhile
Copy link
Collaborator Author

h743 usb dma cannot access DTCM

Oh it's a little pity for performance. I think we need to update the BSP with MPU config to make the region non-cacheable.

For gdfifocfg.EPINFOBASE (max is ghwcfg3.dfifo_depth) I'm the max is more like OTG_DFIFO_DEPTH ? Since ghwcfg3.dfifo_depth is already the value subtracted EP_LOC_CNT.

I'll test with my boards if everything works.

@hathach
Copy link
Owner

hathach commented Sep 24, 2024

h743 usb dma cannot access DTCM

Oh it's a little pity for performance. I think we need to update the BSP with MPU config to make the region non-cacheable.

Maybe just leave it as it is if it isn't much of an issue. I prefer to have generic/simpler code to maintain :)

For gdfifocfg.EPINFOBASE (max is ghwcfg3.dfifo_depth) I'm the max is more like OTG_DFIFO_DEPTH ? Since ghwcfg3.dfifo_depth is already the value subtracted EP_LOC_CNT.

I'll test with my boards if everything works.

From my understanding, it is the max if we use scatter/gather DMA mode. I think the parameter is pre-calculated by synopsys ip configuration tool based on the number of endpoints. However, we can dynamic assign gdfifocfg if we only use the buffer/normal DMA mode.

@HiFiPhile
Copy link
Collaborator Author

Just tested on STM32F407, STM32F723 and STM32U5A5, didn't redo the full matrix but all major combinations work.

@hathach
Copy link
Owner

hathach commented Sep 25, 2024

thank you @HiFiPhile for retesting, I think it is perfect to merge now. I will do more follow-up clean up to opt out code when DMA is used later on e.g remove the xfer fifo and tu_fifo const etc..

@hathach hathach merged commit c8ab65f into hathach:master Sep 25, 2024
97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants