-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86: Optimize arraycopy with constant copy size #7459
base: master
Are you sure you want to change the base?
x86: Optimize arraycopy with constant copy size #7459
Conversation
2432b6f
to
25ae04f
Compare
@0xdaryl May I ask you to review this change? Thank you! |
25ae04f
to
d2c22d0
Compare
{ | ||
regSize = (copySize < 4) ? 2 : 4; | ||
} | ||
else if (copySize < 16) // 8-15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that copySize
is always the full length of the destination array? That is, copySize
is not a subsection of the array. If the answer is no, then I don't think this copy works in all cases because you may overwrite more bytes in the destination than you were allowed to copy. For example, let's say copySize
is 13. This will generate two 8-byte loads and stores from source to destination. But I think that only works if you know that 13
is also the full length of the destination array (where you can copy some extra bytes into the slush between objects).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that copySize is always the full length of the destination array?
No, the copy size isn't always the full length of the detonation array. It doesn't need to be
If copySize
is 13, the generated load/store instructions from generateMemoryCopyInstructions
as below:
[0x7f485307c070] mov rdx, qword ptr [rax+1*rdi-0x8] # L8RegMem, SymRef [#4758 -8]
[0x7f485307c190] mov rax, qword ptr [rax] # L8RegMem
[0x7f485307c2b0] mov qword ptr [rcx+1*rdi-0x8], rdx # S8MemReg, SymRef [#4760 -8]
[0x7f485307c3d0] mov qword ptr [rcx], rax # S8MemReg
src: |------- copysize=13 ------|
..... 1 2 3 4 5 6 7 8 9 10 11 12 13 .....
|------------------|
first load 8 bytes at src + (13 - 8) = src + 5
|--------------|
second load 8 bytes at src
dst: |------- copysize=13 ------|
..... 1 2 3 4 5 6 7 8 9 10 11 12 13 .....
|------------------|
first store 8 bytes at dst + (13 - 8) = dst + 5
|--------------|
second store 8 bytes at dst
Optimize arraycopy sequence if copy size is constant and less than or equal to 64 bytes. This feature can be disabled by env variable TR_DisableArrayCopyInlineSmallSizeConstantCopySize. Create generateMemoryCopyInstructions that loads and stores memory based on specified register size. This method is used by multiple arraycopy inline methods to reduce the repetition of the same code. Update generateArrayElement{Load|Store} to include vector registers and instructions. Signed-off-by: Annabelle Huo <[email protected]>
d2c22d0
to
24c7d3f
Compare
@0xdaryl All comments are addressed. Ready for another review. Thank you! |
Optimize arraycopy sequence if copy size is constant and less than or equal to 64 bytes. This feature can be disabled by env variable
TR_DisableArrayCopyInlineSmallSizeConstantCopySize
.Create
generateMemoryCopyInstructions
that loads and stores memory based on specified register size. This method is used by multiple arraycopy inline methods to reduce the repetition of the same code.Update
generateArrayElement{Load|Store}
to include vector registers and instructions.