Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86: Optimize arraycopy with constant copy size #7459

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

a7ehuo
Copy link
Contributor

@a7ehuo a7ehuo commented Sep 11, 2024

Optimize arraycopy sequence if copy size is constant and less than or equal to 64 bytes. This feature can be disabled by env variable TR_DisableArrayCopyInlineSmallSizeConstantCopySize.

Create generateMemoryCopyInstructions that loads and stores memory based on specified register size. This method is used by multiple arraycopy inline methods to reduce the repetition of the same code.

Update generateArrayElement{Load|Store} to include vector registers and instructions.

@a7ehuo a7ehuo changed the title Optimize arraycopy with constant copy size x86: Optimize arraycopy with constant copy size Sep 11, 2024
@a7ehuo a7ehuo force-pushed the system-arraycopy-perf-20-optimize-constant-copy-size branch from 2432b6f to 25ae04f Compare September 11, 2024 12:48
@a7ehuo a7ehuo removed the request for review from mstoodle September 11, 2024 12:48
@a7ehuo
Copy link
Contributor Author

a7ehuo commented Sep 11, 2024

@0xdaryl May I ask you to review this change? Thank you!

@vijaysun-omr @hzongaro fyi

@a7ehuo a7ehuo force-pushed the system-arraycopy-perf-20-optimize-constant-copy-size branch from 25ae04f to d2c22d0 Compare September 18, 2024 01:13
compiler/x/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved
compiler/x/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved
{
regSize = (copySize < 4) ? 2 : 4;
}
else if (copySize < 16) // 8-15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that copySize is always the full length of the destination array? That is, copySize is not a subsection of the array. If the answer is no, then I don't think this copy works in all cases because you may overwrite more bytes in the destination than you were allowed to copy. For example, let's say copySize is 13. This will generate two 8-byte loads and stores from source to destination. But I think that only works if you know that 13 is also the full length of the destination array (where you can copy some extra bytes into the slush between objects).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that copySize is always the full length of the destination array?

No, the copy size isn't always the full length of the detonation array. It doesn't need to be

If copySize is 13, the generated load/store instructions from generateMemoryCopyInstructions as below:

 [0x7f485307c070] mov   rdx, qword ptr [rax+1*rdi-0x8]      # L8RegMem, SymRef [#4758 -8]
 [0x7f485307c190] mov   rax, qword ptr [rax]    # L8RegMem
 [0x7f485307c2b0] mov   qword ptr [rcx+1*rdi-0x8], rdx      # S8MemReg, SymRef [#4760 -8]
 [0x7f485307c3d0] mov   qword ptr [rcx], rax    # S8MemReg
src:  |------- copysize=13 ------|
..... 1 2 3 4 5 6 7 8 9 10 11 12 13 .....
              |------------------|       
                first load 8 bytes at src + (13 - 8) = src + 5
      |--------------|
         second load 8 bytes at src

dst:  |------- copysize=13 ------|
..... 1 2 3 4 5 6 7 8 9 10 11 12 13 .....
              |------------------|       
                first store 8 bytes at dst + (13 - 8) = dst + 5
      |--------------|
         second store 8 bytes at dst 

Optimize arraycopy sequence if copy size is constant
and less than or equal to 64 bytes. This feature can
be disabled by env variable
TR_DisableArrayCopyInlineSmallSizeConstantCopySize.

Create generateMemoryCopyInstructions that loads and stores
memory based on specified register size. This method is used
by multiple arraycopy inline methods to reduce the repetition
of the same code.

Update generateArrayElement{Load|Store} to include
vector registers and instructions.

Signed-off-by: Annabelle Huo <[email protected]>
@a7ehuo a7ehuo force-pushed the system-arraycopy-perf-20-optimize-constant-copy-size branch from d2c22d0 to 24c7d3f Compare September 19, 2024 17:27
@a7ehuo
Copy link
Contributor Author

a7ehuo commented Sep 19, 2024

@0xdaryl All comments are addressed. Ready for another review. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants