-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting attention kernel file #10091
Splitting attention kernel file #10091
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
8a08b8f
to
ef85d7d
Compare
Signed-off-by: maleksan85 <[email protected]> finishging split clean up Signed-off-by: maleksan85 <[email protected]>
ef85d7d
to
429b7d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. Do you have measurements for compile time for these files before and with this change?
I didn't measure but on my non server machine I stopped to see timeouts during build. Also there will be change that increases compile time constants number. Different for AMD MI and AMD Navi thus increasing number of template instantiations and compile time. |
@tlrmchlsmth are you OK with the change or there are some concerns? If OK, please approve. |
@mergify-io rebase |
oops, sorry, should be @Mergifyio rebase |
@Mergifyio rebase |
❌ Unable to rebase: user
|
https://github.com/Mergifyio rebase |
❌ Pull request can't be updated with latest base branch changesMergify needs the author permission to update the base branch of the pull request. |
Split paged attention kernel into two files for v1 and v2 to speed up compilation when template instantiation explodes.
side effect: support Navi32