Determine appropriate device architecture during compile stage #2493

alexbaden · 2024-10-15T14:05:28Z

We have two new features which rely on the ocloc utility to query GPU architecture info (#1900) and compile native GPU code (#1792) during the compile stage. Ocloc needs a device parameter:

 -device <device_type>                     Target device.
                                            <device_type> can be: tgl, tgllp, rkl, adl-s, rpl-s, adl-p, rpl-p, adl-n, dg1, dg2-g10-a0, dg2-g10-a1, dg2-g10-b0, acm-g10, ats-m150, dg2-g10, dg2, dg2-g10-c0, dg2-g11-a0, dg2-g11-b0, acm-g11, ats-m75, dg2-g11, dg2-g11-b1, acm-g12, dg2-g12, dg2-g12-a0
, pvc-xl-a0, pvc-sdv, pvc-xl-a0p, pvc-xt-a0, pvc-xt-b0, pvc-xt-b1, pvc, pvc-xt-c0, pvc-vg, pvc-xt-c0-vg, mtl-u-a0, arl-s, arl-u, mtl-m, mtl-s, mtl-u, mtl, mtl-u-b0, mtl-h-a0, mtl-h, mtl-p, mtl-h-b0, arl-h-a0, arl-h, arl-h-b0, bmg-g21-a0, bmg-g21-a1, bmg-g21, bmg-g21-b0, lnl-a0, lnl-a1, lnl-m, l
nl-b0, xe, xe2, gen12lp, xe-hpc, xe-hpc-vg, xe-hpg, xe-lp, xe-lpg, xe-lpgplus, xe2-hpg, xe2-lpg, ip version  or hexadecimal value with 0x prefix

We currently get a name string from PyTorch:

'name': 'Intel(R) Data Center GPU Max 1100
'name': 'Intel(R) Graphics [0xe20c]',

we can try to use the name string, or we can ask PyTorch to implement the device architecture API from oneAPI - https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/oneapi/experimental/device_architecture.hpp
However this API is changing between 2024 and 2025, so we will need to be careful to use the right enum in that case (particularly if we compile against one version but use a different version at runtime).

The text was updated successfully, but these errors were encountered:

whitneywhtsang · 2024-10-15T17:17:51Z

@guangyey is going to check with the DPC++ team on whether there is a plan to move device architecture API out of the "experimental".

EikanWang · 2024-10-31T14:40:38Z

@guangyey , pls. keep the issue updated.

guangyey · 2024-11-06T01:17:21Z

It depends on 2025.0.

EikanWang · 2024-11-13T01:09:57Z

@alexbaden , @whitneywhtsang , we are upgrading PyTorch to fit 2025.0. We will submit the PR to add the feature as soon as 2025.0 in PyTorch is ready.

guangyey · 2024-11-13T01:47:11Z

PR is pytorch/pytorch#138186.
The compiler team replies that they have a plan to move architecture out of the experimental namespace, but they have no ETA.

whitneywhtsang assigned guangyey Oct 15, 2024

vlad-penkin added this to the 0.1 [PT 2.6 Upstream] TorchInductor milestone Oct 16, 2024

vlad-penkin added enhancement New feature or request dependencies: sycl runtime dependencies: pytorch labels Oct 16, 2024

vlad-penkin mentioned this issue Oct 21, 2024

[Research][PyTorch 2.6] Save compiled triton kernel as device binary code #1792

Open

vlad-penkin assigned alexbaden and unassigned guangyey Nov 18, 2024

whitneywhtsang linked a pull request Dec 11, 2024 that will close this issue

Parse architecture from PyTorch instead of hard coding #2995

Merged

whitneywhtsang assigned whitneywhtsang and unassigned alexbaden Dec 11, 2024

whitneywhtsang closed this as completed in #2995 Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine appropriate device architecture during compile stage #2493

Determine appropriate device architecture during compile stage #2493

alexbaden commented Oct 15, 2024

whitneywhtsang commented Oct 15, 2024

EikanWang commented Oct 31, 2024

guangyey commented Nov 6, 2024

EikanWang commented Nov 13, 2024

guangyey commented Nov 13, 2024

Determine appropriate device architecture during compile stage #2493

Determine appropriate device architecture during compile stage #2493

Comments

alexbaden commented Oct 15, 2024

whitneywhtsang commented Oct 15, 2024

EikanWang commented Oct 31, 2024

guangyey commented Nov 6, 2024

EikanWang commented Nov 13, 2024

guangyey commented Nov 13, 2024