-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use stricter host buffer alignment (64B) required by modern CPUs. #19
Conversation
Hi, I would like to propose a change to improve reported PCIe D2H numbers in OpenCL (on some CPU's).
and after:
please see: |
Hi @pioto1225, thanks a lot for this finding! Took some time to reproduce; I see a moderate improvement in D2H bandwidth on Raptor Lake, not nearly as much as you observe on Arrow Lake. I've made the changes in this commit, with some additions:
Thank you, and kind regards, |
Hi Moritz, What GPU are you using with Raptorlake, and what improvement do you observe after adding 64B alignment? Many thanks. |
Hi @pioto1225, I'm pairing an i7-13700K with an A770, in PCIe 4.0 x8 (bifurcation with 2nd PCIe x8 slot). Unaligned pointer was ~10GB/s H2D and ~7GB/s D2H. 64B-aligned pointer is ~10GB/s for both. Strangely the B580 in the 2nd PCIe 4.0 x8 slot didn't show slowdown in the first place, has both ~12GB/s D2H/H2D with/without 64B alignment. Have you seen the D2H slowdown on other GPUs? Could it be specific to Arc Alchemist? Kind regards, |
Hi there, Thanks for the info! Very interesting point with B580, it is a shame that it is only x8 interface. I am hoping for B770 to show up, which I would like to get. Unfortunately I do not have any other dGPU to test. I had a feeling that this was host related, where arrowlake was way more sensitive to lack of host buffer alignment (1.6GB/s D2H) than raptorlake(/refresh) with 7GB/s D2H. But your data might prove me wrong. Nevertheless, it looks like B580 has better PCIe implementation. |
Hi @pioto1225, here's the
|
Thanks!
Thanks for posting Nvidia card too. This one correctly advertises PCIe 3.0/16 capability: The issue is just with reporting in Intel cards, they do operate at correct speed as shown in your PCIe benchmarks. One of the reasons Battleimage is faster than Arc in D2H and H2D transfers is higher MPS (BattleImage: 256, Arc: 128). Regards, |
Hi @pioto1225, thanks for pointing out the PCIe interface reporting issue! I'll forward this internally at Intel. Cheers, |
No description provided.