-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RP2040: track DPRAM buffer for each endpoint #1802
base: master
Are you sure you want to change the base?
Conversation
78d6f75
to
2a9b493
Compare
this is a good idea; but i'd use a single uint64_t to do the tracking and some basic bit ops (you are using 8x the memory and don't seem to make use of the difference between 0xff and the item start) other than for an assertion. |
d1434dc
to
2db9ecd
Compare
Update the code change to use uint64_t to track the allocation status. Calculate allocation size when free. Put the size calculation into a new static function to make sure we have the same logic when allocating and free. |
// size must be multiple of 64 | ||
uint size = _hw_endpoint_data_size(ep); | ||
|
||
uint block_num = size / 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason i suggested using a uint64_t was to use bit operations rather than a bunch of loops; e.g. something like:
uint block_count = size / 64;
uint64_t block_mask = (1ull << block_count) - 1;
for (size_t start = 0; start < DPRAM_BLOCKS - block_num; start++) {
if (!(dpram_state & block_mask)) {
start_block = start;
dpram_state |= block_mask;
}
block_mask <<= 1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Cortex-M0 is very inefficient with bitwise operations, in addition uint64_t
is not a native data type which needs runtime support like ABI I64Shl to do math operations.
So it's better to compare the assembly output before decide which method is better :)
(Example revert bitfield to uint8_t of usbd_device_t
reduce the code size of usbd.o to 2496 from 2512 on IAR Cortex-M33)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCC deals with it inline, so it is a bit better (smaller in debug mode, but slightly larger in release mode)
The intention here was to save RAM (8 bytes vs 64), however I do agree that perhaps it is undesirable in general to have much of any increase in code size since this feature is something most be clearly don't need (their endpoint use is largely static)
I therefore suggest we #ifdef
this new code
a5af253
to
3db5ca0
Compare
@kilograham can you approve the workflow? so that I can verify build |
please merge. |
see comment (I think we should #ifdef the tracking support since most people probably don't need it)
i pushed a change to change optimization to -Os for |
Did you measure the code size difference? I believe RAM difference is only 8bytes. On RPI, it is so small to ignore. |
hihi, thank you @howard0su for the PR, and sorry for late response. Recently, we have added a new API for ISO transfer, that would reserve the largest packet size before SET_INTERFACE (activation). This will help to solve the changing endpoint size of ISO transfer. https://github.com/hathach/tinyusb/blob/master/src/device/dcd.h#L172-L175 The API is currently only implemented with stm32 fsdev, but will be added for rp2040 later as well when I have time. I appreciate your effort but I think reserving largest size (with endpoint with multiple size) work best as generic for most ports (some mcu has difficulty managing usb ram). |
Apply changes from PR1802. hathach#1802
Describe the PR
Running audio example on pico, I am still hitting the following assert:
hard_assert(hw_data_offset(next_buffer_ptr) <= USB_DPRAM_MAX);
Additional context
The issue is that pico implementation only reclaim the memory when all the endpoints are closed. In audio example, speaker and mic may not share the same lifecycle. This end up the crash.
I fixed this to implement a poor man allocation algorithm to track 64 of the blocks with 64 bytes. Since there are only 64 blocks, the simple algorithm works well.
Tested on Pico with UAC2 example