Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse architecture from PyTorch instead of hard coding #2995

Merged
merged 9 commits into from
Dec 12, 2024

Conversation

whitneywhtsang
Copy link
Contributor

With pytorch/pytorch#138186, architecture is added to XPU device property.
Instead of hard coding pvc when invoking ocloc, this PR changed to dynamically passing the device architecture parsed.

Signed-off-by: Whitney Tsang <[email protected]>
@whitneywhtsang whitneywhtsang force-pushed the whitneywhtsang/device_arch branch from aa8c13c to a853d16 Compare December 11, 2024 18:37
Signed-off-by: Whitney Tsang <[email protected]>
@whitneywhtsang whitneywhtsang force-pushed the whitneywhtsang/device_arch branch from 4e1e895 to d378a2b Compare December 11, 2024 19:32
Signed-off-by: Whitney Tsang <[email protected]>
@whitneywhtsang whitneywhtsang force-pushed the whitneywhtsang/device_arch branch from 37b6217 to 47a3216 Compare December 11, 2024 20:01
Signed-off-by: Whitney Tsang <[email protected]>
Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pbchekin
Copy link
Contributor

Looks like this change makes CI slower (~2h vs ~1.5h).

@whitneywhtsang
Copy link
Contributor Author

Looks like this change makes CI slower (~2h vs ~1.5h).

With 5e6d2c9, we verified that the supported extensions are unchanged, so the increase in CI time should be due to ocloc invocation, 89aa35d attempts to reduce the number of invocation by half.

@chengjunlu
Copy link
Contributor

Looks like this change makes CI slower (~2h vs ~1.5h).

With 5e6d2c9, we verified that the supported extensions are unchanged, so the increase in CI time should be due to ocloc invocation, 89aa35d attempts to reduce the number of invocation by half.

Maybe we can cache the hardware capability to reduce the number call of ocloc further.

@whitneywhtsang
Copy link
Contributor Author

whitneywhtsang commented Dec 12, 2024

Looks like this change makes CI slower (~2h vs ~1.5h).

With 5e6d2c9, we verified that the supported extensions are unchanged, so the increase in CI time should be due to ocloc invocation, 89aa35d attempts to reduce the number of invocation by half.

Maybe we can cache the hardware capability to reduce the number call of ocloc further.

New CI time is 1h 36m 48s, let's see how much more caching can reduce.

Signed-off-by: Whitney Tsang <[email protected]>
@whitneywhtsang
Copy link
Contributor Author

Looks like this change makes CI slower (~2h vs ~1.5h).

With 5e6d2c9, we verified that the supported extensions are unchanged, so the increase in CI time should be due to ocloc invocation, 89aa35d attempts to reduce the number of invocation by half.

Maybe we can cache the hardware capability to reduce the number call of ocloc further.

New CI time is 1h 36m 48s, let's see if cache can reduce more in a different PR.

With caching, further reduced to 1h 25m 26s.

@whitneywhtsang whitneywhtsang merged commit b70c7f7 into main Dec 12, 2024
5 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/device_arch branch December 12, 2024 03:54
@whitneywhtsang whitneywhtsang linked an issue Dec 16, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants