Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made critical changes to small_gemm #568

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Meghana-vankadari
Copy link
Contributor

Made some critical changes to small_gemm.

AMD BLIS Upstream:

This PR includes following commits for AMD BLIS version 3.0.1

ac2a50f Fixed blastest failure for haswell configuration
c597fa6 Disabled calling of bli_dgemm_small from gemm_front
1c6d455 Implemented 16x3 based gemm kernel for the case where A has transpose
ed7780d Made some critical changes to small_gemm kernels

Meghana-vankadari and others added 4 commits October 28, 2021 13:42
Details:
- In case of GEMM, whenever beta is zero, we need to perform C = alpha
*(A * B) instead of C = beta * C + alpha * (A * B)
 Added conditions to check the value of beta at different levels inside
 small_gemm kernels and decide whether to perform scaling C with beta or
 not.
-Modified small_gemm kernels to use BLIS specific functions to retrieve
 different fields of objects.
-Calling bli_gemm_check before entering bli_gemm_small to facilitate
 early return in case of invalid inputs.
-For corner cases inside small_gemm kernels, a buffer called f_temp
 is used to load and store data to and from registers.
 populating the buffer with zeroes before use.
-In bli_gemm_front, datatypes of status and return value from
 bli_gemm_small are not matching.
 Corrected the datatype of the variable 'status' inside bli_gemm_front
 to err_t.

Change-Id: I8b52ad55008f028d6c8b7e0d20f746a869d9daea
Signed-off-by: Meghana Vankadari <[email protected]>
AMD-Internal: [CPUPL-689,SWLCSG-104]
Details:
- This implementation does a transpose operation while packing 16xk of A
  buffer and passes it to 16x3-nn kernel.
- The same implementation works for the case where B has transpose.

AMD-Internal: [CPUPL-1376]
Change-Id: I81f74deb609926598f62c30f5bd6fc80fb1b9a17
Details:
- Decision logic to choose small_gemm has been moved to blas interface.
- Redirecting all the calls to small_gemm from gemm_front to native
  implementation.

AMD-Internal: [CPUPL-1376]
Change-Id: I6490f67113e9f7c272269f441c86f2a0b3c89a53
Details:
- Placed optimized version of BLAS DGEMM, ZGEMM definitions under
  BLIS_CONFIG_EPYC as they use gemm small which are defined only
  for zen family configurations.
- Added code to query and set cntx in gemv and trsv framework before
  cntx is referred for any function pointers to avoid querying
   from NULL pointer.

AMD-Internal: [CPUPL-1562]
Change-Id: I977d028ec4ddb57dcdc70e443e7708f36c01cca9
ctype* A11; \
ctype* A21; \
ctype* a01; \
ctype* alpha11; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid whitespace changes.



// gemm square matrix size friendly implementation
err_t bli_gemm_sqp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants