Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-sized diffs in smoke_tests.nativeaot #112756

Open
jakobbotsch opened this issue Feb 20, 2025 · 11 comments · May be fixed by #112764
Open

Zero-sized diffs in smoke_tests.nativeaot #112756

jakobbotsch opened this issue Feb 20, 2025 · 11 comments · May be fixed by #112764
Labels
area-NativeAOT-coreclr in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner

Comments

@jakobbotsch
Copy link
Member

We're currently getting zero-sized diffs in the smoke_tests.nativeaot collection, even on changes without any expected JIT diffs.
Example: https://dev.azure.com/dnceng-public/public/_build/results?buildId=957900&view=ms.vss-build-web.run-extensions-tab

cc @dotnet/jit-contrib

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 20, 2025
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@filipnavara
Copy link
Member

That PR has change in RISC-V JIT for emitting unwinding codes. I'm not surprised that it triggered the pipelines that generate those diffs.

@jakobbotsch
Copy link
Member Author

It was just an example. The point is that we see spurious diffs in smoke_tests.nativeaot. Here is another example: https://dev.azure.com/dnceng-public/public/_build/results?buildId=957764&view=ms.vss-build-web.run-extensions-tab

Image

@filipnavara
Copy link
Member

Here is another example:

That PR also made change in the jit directory (src/coreclr/jit/CMakeLists.txt), so that's also not too surprising to me. 🤷‍♂️

@jakobbotsch
Copy link
Member Author

Here is another example:

That PR also made change in the jit directory (src/coreclr/jit/CMakeLists.txt), so that's also not too surprising to me. 🤷‍♂️

The surprise is not that superpmi-diffs is running, the surprise is that smoke_tests.nativeaot is showing up as having diffs when the PRs do not change anything about what is being emitted.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Feb 20, 2025

Output
; Assembly listing for method System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX10.2/512 - Windows
; FullOpts code
; NativeAOT compilation
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 loc0         [V00,T18] (  4,  9.50)  double  ->  mm7        
;  V01 loc1         [V01,T19] (  4,  9.50)  double  ->  mm6        
;  V02 loc2         [V02,T04] (  3, 64.25)    long  ->  rbx        
;* V03 loc3         [V03,T12] (  0,  0   )     int  ->  zero-ref   
;  V04 loc4         [V04,T00] ( 10,264   )    long  ->  rbp        
;  V05 loc5         [V05,T01] ( 14,224   )     int  ->  rdi        
;* V06 loc6         [V06,T05] (  0,  0   )     int  ->  zero-ref   
;* V07 loc7         [V07,T06] (  0,  0   )     int  ->  zero-ref   
;  V08 OutArgs      [V08    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
;  V09 tmp1         [V09,T13] (  3, 24   )  simd16  ->  mm0         "Cloning op2 for Math.Max/Min"
;  V10 tmp2         [V10,T14] (  3, 24   )  simd16  ->  mm7         "Cloning op1 for Math.Max/Min"
;  V11 tmp3         [V11,T15] (  3, 24   )  simd16  ->  mm0         "Cloning op2 for Math.Max/Min"
;  V12 tmp4         [V12,T16] (  3, 24   )  simd16  ->  mm6         "Cloning op1 for Math.Max/Min"
;  V13 tmp5         [V13,T11] (  2,  1   )     int  ->  rax         "Inline return value spill temp"
;  V14 tmp6         [V14,T08] (  3,  3   )     int  ->  rax         "Inlining Arg"
;  V15 cse0         [V15,T17] (  5, 16.25)  simd16  ->  mm8         hoist "CSE #02: aggressive"
;  V16 cse1         [V16,T20] (  3,  3   )  double  ->  mm6         "CSE #01: aggressive"
;  V17 rat0         [V17,T07] (  4, 12.25)     int  ->  rsi         "Trip count IV"
;  V18 rat1         [V18,T02] (  4,196   )     int  ->  r14         "Trip count IV"
;  V19 rat2         [V19,T03] (  4,196   )     int  ->  r14         "Trip count IV"
;  V20 rat3         [V20,T09] (  3,  1.50)    long  ->  rbx         "fgMakeTemp is creating a new local variable"
;  V21 rat4         [V21,T10] (  3,  1.50)    long  ->  rdx         "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 80

G_M1452_IG01:  ;; offset=0x0000
       4156                 push     r14
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4883EC50             sub      rsp, 80
       C5F829742440         vmovaps  xmmword ptr [rsp+0x40], xmm6
       C5F8297C2430         vmovaps  xmmword ptr [rsp+0x30], xmm7
       C57829442420         vmovaps  xmmword ptr [rsp+0x20], xmm8
						;; size=28 bbWeight=1 PerfScore 11.25
G_M1452_IG02:  ;; offset=0x001C
       C5FB103500000000     vmovsd   xmm6, qword ptr [reloc @RWD00]
       C5F828FE             vmovaps  xmm7, xmm6
       E800000000           call     <unknown method>
       E800000000           call     System.Threading.Thread:GetCurrentProcessorNumber():int
       85C0                 test     eax, eax
       0F8C5C010000         jl       G_M1452_IG18
						;; size=30 bbWeight=1 PerfScore 6.50
G_M1452_IG03:  ;; offset=0x003A
       488D1D00000000       lea      rbx, [(reloc 0x4000000000427e88)]
       48837BF800           cmp      qword ptr [rbx-0x08], 0
       0F8573010000         jne      G_M1452_IG20
						;; size=18 bbWeight=0.25 PerfScore 1.12
G_M1452_IG04:  ;; offset=0x004C
       48BADB34B6D782DE1B43 mov      rdx, 0x431BDE82D7B634DB
       488BC2               mov      rax, rdx
       48F72B               imul     rdx:rax, qword ptr [rbx]
       488BC2               mov      rax, rdx
       48C1E83F             shr      rax, 63
       48C1FA12             sar      rdx, 18
       488D5C0201           lea      rbx, [rdx+rax+0x01]
       C578100500000000     vmovups  xmm8, xmmword ptr [reloc @RWD16]
       BE0A000000           mov      esi, 10
						;; size=45 bbWeight=0.25 PerfScore 3.00
G_M1452_IG05:  ;; offset=0x0079
       BF08000000           mov      edi, 8
						;; size=5 bbWeight=4 PerfScore 1.00
G_M1452_IG06:  ;; offset=0x007E
       03FF                 add      edi, edi
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       488BE8               mov      rbp, rax
       85FF                 test     edi, edi
       7E0D                 jle      SHORT G_M1452_IG09
						;; size=14 bbWeight=32 PerfScore 88.00
G_M1452_IG07:  ;; offset=0x008C
       448BF7               mov      r14d, edi
						;; size=3 bbWeight=4 PerfScore 1.00
G_M1452_IG08:  ;; offset=0x008F
       E800000000           call     System.Threading.Thread:GetCurrentProcessorNumber():int
       41FFCE               dec      r14d
       75F6                 jne      SHORT G_M1452_IG08
						;; size=10 bbWeight=64 PerfScore 144.00
G_M1452_IG09:  ;; offset=0x0099
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       482BC5               sub      rax, rbp
       488BE8               mov      rbp, rax
       483BEB               cmp      rbp, rbx
       7CD5                 jl       SHORT G_M1452_IG06
						;; size=16 bbWeight=32 PerfScore 88.00
G_M1452_IG10:  ;; offset=0x00A9
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C4E1FB2AC5           vcvtsi2sd xmm0, xmm0, rbp
       C5F057C9             vxorps   xmm1, xmm1, xmm1
       C5F32ACF             vcvtsi2sd xmm1, xmm1, edi
       C5FB5EC1             vdivsd   xmm0, xmm0, xmm1
       62F3C50851C804       vrangesd xmm1, xmm7, xmm0, 4
       62D3FD0855F800       vfixupimmsd xmm7, xmm0, xmm8, 0
       62D3C50855C800       vfixupimmsd xmm1, xmm7, xmm8, 0
       C5F828F9             vmovaps  xmm7, xmm1
       8BC7                 mov      eax, edi
       C1F81F               sar      eax, 31
       83E003               and      eax, 3
       03C7                 add      eax, edi
       8BF8                 mov      edi, eax
       C1FF02               sar      edi, 2
						;; size=61 bbWeight=4 PerfScore 143.67
G_M1452_IG11:  ;; offset=0x00E6
       03FF                 add      edi, edi
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       488BE8               mov      rbp, rax
       85FF                 test     edi, edi
       7E0D                 jle      SHORT G_M1452_IG14
						;; size=14 bbWeight=32 PerfScore 88.00
G_M1452_IG12:  ;; offset=0x00F4
       448BF7               mov      r14d, edi
						;; size=3 bbWeight=4 PerfScore 1.00
G_M1452_IG13:  ;; offset=0x00F7
       E800000000           call     <unknown method>
       41FFCE               dec      r14d
       75F6                 jne      SHORT G_M1452_IG13
						;; size=10 bbWeight=64 PerfScore 144.00
G_M1452_IG14:  ;; offset=0x0101
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       482BC5               sub      rax, rbp
       488BE8               mov      rbp, rax
       483BEB               cmp      rbp, rbx
       7CD5                 jl       SHORT G_M1452_IG11
						;; size=16 bbWeight=32 PerfScore 88.00
G_M1452_IG15:  ;; offset=0x0111
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C4E1FB2AC5           vcvtsi2sd xmm0, xmm0, rbp
       C5F057C9             vxorps   xmm1, xmm1, xmm1
       C5F32ACF             vcvtsi2sd xmm1, xmm1, edi
       C5FB5EC1             vdivsd   xmm0, xmm0, xmm1
       62F3CD0851C804       vrangesd xmm1, xmm6, xmm0, 4
       62D3FD0855F000       vfixupimmsd xmm6, xmm0, xmm8, 0
       62D3CD0855C800       vfixupimmsd xmm1, xmm6, xmm8, 0
       C5F828F1             vmovaps  xmm6, xmm1
       FFCE                 dec      esi
       0F8532FFFFFF         jne      G_M1452_IG05
						;; size=54 bbWeight=4 PerfScore 140.67
G_M1452_IG16:  ;; offset=0x0147
       C5C3590500000000     vmulsd   xmm0, xmm7, qword ptr [reloc @RWD32]
       C5FB5EC6             vdivsd   xmm0, xmm0, xmm6
       62F57F086DC0         vcvttsd2sis eax, xmm0
       B988130000           mov      ecx, 0x1388
       3D88130000           cmp      eax, 0x1388
       0F4FC1               cmovg    eax, ecx
       890500000000         mov      dword ptr [(reloc 0x4000000000427e50)], eax
       833D0000000005       cmp      dword ptr [(reloc 0x4000000000427e50)], 5      ; static handle
       0F9EC0               setle    al
       0FB6C0               movzx    rax, al
						;; size=50 bbWeight=0.50 PerfScore 14.50
G_M1452_IG17:  ;; offset=0x0179
       C5F828742440         vmovaps  xmm6, xmmword ptr [rsp+0x40]
       C5F8287C2430         vmovaps  xmm7, xmmword ptr [rsp+0x30]
       C57828442420         vmovaps  xmm8, xmmword ptr [rsp+0x20]
       4883C450             add      rsp, 80
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret      
						;; size=29 bbWeight=0.50 PerfScore 7.88
G_M1452_IG18:  ;; offset=0x0196
       C70500000000FFFF0000 mov      dword ptr [(reloc 0x4000000000427e50)], 0xFFFF      ; static handle
       33C0                 xor      eax, eax
						;; size=12 bbWeight=0.50 PerfScore 0.62
G_M1452_IG19:  ;; offset=0x01A2
       C5F828742440         vmovaps  xmm6, xmmword ptr [rsp+0x40]
       C5F8287C2430         vmovaps  xmm7, xmmword ptr [rsp+0x30]
       C57828442420         vmovaps  xmm8, xmmword ptr [rsp+0x20]
       4883C450             add      rsp, 80
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret      
						;; size=29 bbWeight=0.50 PerfScore 7.88
G_M1452_IG20:  ;; offset=0x01BF
       E800000000           call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
       E983FEFFFF           jmp      G_M1452_IG04
						;; size=10 bbWeight=0 PerfScore 0.00
RWD00  	dq	7FEFFFFFFFFFFFFFh	; 1.79769313e+308
RWD08  	dd	00000000h, 00000000h
RWD16  	dq	0000000000000001h, 0000000000000000h
RWD32  	dq	4014000000000000h	;            5


; Total bytes of code 457, prolog size 28, PerfScore 980.08, instruction count 116, allocated bytes for code 457 (MethodHash=45e1fa53) for method System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
; ============================================================

; Assembly listing for method System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX10.2/512 - Windows
; FullOpts code
; NativeAOT compilation
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 loc0         [V00,T18] (  4,  9.50)  double  ->  mm7        
;  V01 loc1         [V01,T19] (  4,  9.50)  double  ->  mm6        
;  V02 loc2         [V02,T04] (  3, 64.25)    long  ->  rbx        
;* V03 loc3         [V03,T12] (  0,  0   )     int  ->  zero-ref   
;  V04 loc4         [V04,T00] ( 10,264   )    long  ->  rbp        
;  V05 loc5         [V05,T01] ( 14,224   )     int  ->  rdi        
;* V06 loc6         [V06,T05] (  0,  0   )     int  ->  zero-ref   
;* V07 loc7         [V07,T06] (  0,  0   )     int  ->  zero-ref   
;  V08 OutArgs      [V08    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <UNNAMED>
;  V09 tmp1         [V09,T13] (  3, 24   )  simd16  ->  mm0         "Cloning op2 for Math.Max/Min"
;  V10 tmp2         [V10,T14] (  3, 24   )  simd16  ->  mm7         "Cloning op1 for Math.Max/Min"
;  V11 tmp3         [V11,T15] (  3, 24   )  simd16  ->  mm0         "Cloning op2 for Math.Max/Min"
;  V12 tmp4         [V12,T16] (  3, 24   )  simd16  ->  mm6         "Cloning op1 for Math.Max/Min"
;  V13 tmp5         [V13,T11] (  2,  1   )     int  ->  rax         "Inline return value spill temp"
;  V14 tmp6         [V14,T08] (  3,  3   )     int  ->  rax         "Inlining Arg"
;  V15 cse0         [V15,T17] (  5, 16.25)  simd16  ->  mm8         hoist "CSE #02: aggressive"
;  V16 cse1         [V16,T20] (  3,  3   )  double  ->  mm6         "CSE #01: aggressive"
;  V17 rat0         [V17,T07] (  4, 12.25)     int  ->  rsi         "Trip count IV"
;  V18 rat1         [V18,T02] (  4,196   )     int  ->  r14         "Trip count IV"
;  V19 rat2         [V19,T03] (  4,196   )     int  ->  r14         "Trip count IV"
;  V20 rat3         [V20,T09] (  3,  1.50)    long  ->  rbx         "fgMakeTemp is creating a new local variable"
;  V21 rat4         [V21,T10] (  3,  1.50)    long  ->  rdx         "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 80

G_M1452_IG01:  ;; offset=0x0000
       4156                 push     r14
       57                   push     rdi
       56                   push     rsi
       55                   push     rbp
       53                   push     rbx
       4883EC50             sub      rsp, 80
       C5F829742440         vmovaps  xmmword ptr [rsp+0x40], xmm6
       C5F8297C2430         vmovaps  xmmword ptr [rsp+0x30], xmm7
       C57829442420         vmovaps  xmmword ptr [rsp+0x20], xmm8
						;; size=28 bbWeight=1 PerfScore 11.25
G_M1452_IG02:  ;; offset=0x001C
       C5FB103500000000     vmovsd   xmm6, qword ptr [reloc @RWD00]
       C5F828FE             vmovaps  xmm7, xmm6
       E800000000           call     <unknown method>
       E800000000           call     System.Threading.Thread:GetCurrentProcessorNumber():int
       85C0                 test     eax, eax
       0F8C5C010000         jl       G_M1452_IG18
						;; size=30 bbWeight=1 PerfScore 6.50
G_M1452_IG03:  ;; offset=0x003A
       488D1D00000000       lea      rbx, [(reloc 0x4000000000427e88)]
       48837BF800           cmp      qword ptr [rbx-0x08], 0
       0F8573010000         jne      G_M1452_IG20
						;; size=18 bbWeight=0.25 PerfScore 1.12
G_M1452_IG04:  ;; offset=0x004C
       48BADB34B6D782DE1B43 mov      rdx, 0x431BDE82D7B634DB
       488BC2               mov      rax, rdx
       48F72B               imul     rdx:rax, qword ptr [rbx]
       488BC2               mov      rax, rdx
       48C1E83F             shr      rax, 63
       48C1FA12             sar      rdx, 18
       488D5C0201           lea      rbx, [rdx+rax+0x01]
       C578100500000000     vmovups  xmm8, xmmword ptr [reloc @RWD16]
       BE0A000000           mov      esi, 10
						;; size=45 bbWeight=0.25 PerfScore 3.00
G_M1452_IG05:  ;; offset=0x0079
       BF08000000           mov      edi, 8
						;; size=5 bbWeight=4 PerfScore 1.00
G_M1452_IG06:  ;; offset=0x007E
       03FF                 add      edi, edi
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       488BE8               mov      rbp, rax
       85FF                 test     edi, edi
       7E0D                 jle      SHORT G_M1452_IG09
						;; size=14 bbWeight=32 PerfScore 88.00
G_M1452_IG07:  ;; offset=0x008C
       448BF7               mov      r14d, edi
						;; size=3 bbWeight=4 PerfScore 1.00
G_M1452_IG08:  ;; offset=0x008F
       E800000000           call     System.Threading.Thread:GetCurrentProcessorNumber():int
       41FFCE               dec      r14d
       75F6                 jne      SHORT G_M1452_IG08
						;; size=10 bbWeight=64 PerfScore 144.00
G_M1452_IG09:  ;; offset=0x0099
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       482BC5               sub      rax, rbp
       488BE8               mov      rbp, rax
       483BEB               cmp      rbp, rbx
       7CD5                 jl       SHORT G_M1452_IG06
						;; size=16 bbWeight=32 PerfScore 88.00
G_M1452_IG10:  ;; offset=0x00A9
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C4E1FB2AC5           vcvtsi2sd xmm0, xmm0, rbp
       C5F057C9             vxorps   xmm1, xmm1, xmm1
       C5F32ACF             vcvtsi2sd xmm1, xmm1, edi
       C5FB5EC1             vdivsd   xmm0, xmm0, xmm1
       62F3C50851C804       vrangesd xmm1, xmm7, xmm0, 4
       62D3FD0855F800       vfixupimmsd xmm7, xmm0, xmm8, 0
       62D3C50855C800       vfixupimmsd xmm1, xmm7, xmm8, 0
       C5F828F9             vmovaps  xmm7, xmm1
       8BC7                 mov      eax, edi
       C1F81F               sar      eax, 31
       83E003               and      eax, 3
       03C7                 add      eax, edi
       8BF8                 mov      edi, eax
       C1FF02               sar      edi, 2
						;; size=61 bbWeight=4 PerfScore 143.67
G_M1452_IG11:  ;; offset=0x00E6
       03FF                 add      edi, edi
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       488BE8               mov      rbp, rax
       85FF                 test     edi, edi
       7E0D                 jle      SHORT G_M1452_IG14
						;; size=14 bbWeight=32 PerfScore 88.00
G_M1452_IG12:  ;; offset=0x00F4
       448BF7               mov      r14d, edi
						;; size=3 bbWeight=4 PerfScore 1.00
G_M1452_IG13:  ;; offset=0x00F7
       E800000000           call     <unknown method>
       41FFCE               dec      r14d
       75F6                 jne      SHORT G_M1452_IG13
						;; size=10 bbWeight=64 PerfScore 144.00
G_M1452_IG14:  ;; offset=0x0101
       E800000000           call     System.Diagnostics.Stopwatch:QueryPerformanceCounter():long
       482BC5               sub      rax, rbp
       488BE8               mov      rbp, rax
       483BEB               cmp      rbp, rbx
       7CD5                 jl       SHORT G_M1452_IG11
						;; size=16 bbWeight=32 PerfScore 88.00
G_M1452_IG15:  ;; offset=0x0111
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C4E1FB2AC5           vcvtsi2sd xmm0, xmm0, rbp
       C5F057C9             vxorps   xmm1, xmm1, xmm1
       C5F32ACF             vcvtsi2sd xmm1, xmm1, edi
       C5FB5EC1             vdivsd   xmm0, xmm0, xmm1
       62F3CD0851C804       vrangesd xmm1, xmm6, xmm0, 4
       62D3FD0855F000       vfixupimmsd xmm6, xmm0, xmm8, 0
       62D3CD0855C800       vfixupimmsd xmm1, xmm6, xmm8, 0
       C5F828F1             vmovaps  xmm6, xmm1
       FFCE                 dec      esi
       0F8532FFFFFF         jne      G_M1452_IG05
						;; size=54 bbWeight=4 PerfScore 140.67
G_M1452_IG16:  ;; offset=0x0147
       C5C3590500000000     vmulsd   xmm0, xmm7, qword ptr [reloc @RWD32]
       C5FB5EC6             vdivsd   xmm0, xmm0, xmm6
       62F57F086DC0         vcvttsd2sis eax, xmm0
       B988130000           mov      ecx, 0x1388
       3D88130000           cmp      eax, 0x1388
       0F4FC1               cmovg    eax, ecx
       890500000000         mov      dword ptr [(reloc 0x4000000000427e50)], eax
       833D0000000005       cmp      dword ptr [(reloc 0x4000000000427e50)], 5      ; static handle
       0F9EC0               setle    al
       0FB6C0               movzx    rax, al
						;; size=50 bbWeight=0.50 PerfScore 14.50
G_M1452_IG17:  ;; offset=0x0179
       C5F828742440         vmovaps  xmm6, xmmword ptr [rsp+0x40]
       C5F8287C2430         vmovaps  xmm7, xmmword ptr [rsp+0x30]
       C57828442420         vmovaps  xmm8, xmmword ptr [rsp+0x20]
       4883C450             add      rsp, 80
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret      
						;; size=29 bbWeight=0.50 PerfScore 7.88
G_M1452_IG18:  ;; offset=0x0196
       C70500000000FFFF0000 mov      dword ptr [(reloc 0x4000000000427e50)], 0xFFFF      ; static handle
       33C0                 xor      eax, eax
						;; size=12 bbWeight=0.50 PerfScore 0.62
G_M1452_IG19:  ;; offset=0x01A2
       C5F828742440         vmovaps  xmm6, xmmword ptr [rsp+0x40]
       C5F8287C2430         vmovaps  xmm7, xmmword ptr [rsp+0x30]
       C57828442420         vmovaps  xmm8, xmmword ptr [rsp+0x20]
       4883C450             add      rsp, 80
       5B                   pop      rbx
       5D                   pop      rbp
       5E                   pop      rsi
       5F                   pop      rdi
       415E                 pop      r14
       C3                   ret      
						;; size=29 bbWeight=0.50 PerfScore 7.88
G_M1452_IG20:  ;; offset=0x01BF
       E800000000           call     CORINFO_HELP_READYTORUN_NONGCSTATIC_BASE
       E983FEFFFF           jmp      G_M1452_IG04
						;; size=10 bbWeight=0 PerfScore 0.00
RWD00  	dq	7FEFFFFFFFFFFFFFh	; 1.79769313e+308
RWD08  	dd	00000000h, 00000000h
RWD16  	dq	0000000000000001h, 0000000000000000h
RWD32  	dq	4014000000000000h	;            5


; Total bytes of code 457, prolog size 28, PerfScore 980.08, instruction count 116, allocated bytes for code 457 (MethodHash=45e1fa53) for method System.Threading.ProcessorIdCache:ProcessorNumberSpeedCheck():ubyte (FullOpts)
; ============================================================

ERROR: Decode Failure Left@ offset 199e4ec6cf3
ERROR: Decode Failure Right@ offset 199e4eb7193
Using jit(C:\dev\dotnet\spmi\basejit\482e5b1d6fd547e9211eea90af03c98036e0e0e6.windows.x64.Checked\clrjit.dll) with input (C:\dev\dotnet\spmi\mch\a116647a-3f80-4fd6-9c80-95156c7e9923.windows.x64\smoke_tests.nativeaot.windows.x64.checked.mch)
 indexCount=1 (9439)
Jit startup took 3.886600ms
Jit startup took 1.883100ms
-----------------------------------------------
Block:   Left
Size:    457
Address: 199e4ec6ba0
CodePtr: 199e4eb6b64
-----------------------------------------------
199e4ec6ba0: 41 56               	push	r14
199e4ec6ba2: 57                  	push	rdi
199e4ec6ba3: 56                  	push	rsi
199e4ec6ba4: 55                  	push	rbp
199e4ec6ba5: 53                  	push	rbx
199e4ec6ba6: 48 83 ec 50         	sub	rsp, 80
199e4ec6baa: c5 f8 29 74 24 40   	vmovaps	xmmword ptr [rsp + 64], xmm6
199e4ec6bb0: c5 f8 29 7c 24 30   	vmovaps	xmmword ptr [rsp + 48], xmm7
199e4ec6bb6: c5 78 29 44 24 20   	vmovaps	xmmword ptr [rsp + 32], xmm8
199e4ec6bbc: c5 fb 10 35 a5 01 00 00	vmovsd	xmm6, qword ptr [rip + 421]
199e4ec6bc4: c5 f8 28 fe         	vmovaps	xmm7, xmm6
199e4ec6bc8: e8 9c 01 00 00      	call	412
199e4ec6bcd: e8 97 01 00 00      	call	407
199e4ec6bd2: 85 c0               	test	eax, eax
199e4ec6bd4: 0f 8c 5c 01 00 00   	jl	348
199e4ec6bda: 48 8d 1d 88 7e 42 00	lea	rbx, [rip + 4357768]
199e4ec6be1: 48 83 7b f8 00      	cmp	qword ptr [rbx - 8], 0
199e4ec6be6: 0f 85 73 01 00 00   	jne	371
199e4ec6bec: 48 ba db 34 b6 d7 82 de 1b 43	movabs	rdx, 4835703278458516699
199e4ec6bf6: 48 8b c2            	mov	rax, rdx
199e4ec6bf9: 48 f7 2b            	imul	qword ptr [rbx]
199e4ec6bfc: 48 8b c2            	mov	rax, rdx
199e4ec6bff: 48 c1 e8 3f         	shr	rax, 63
199e4ec6c03: 48 c1 fa 12         	sar	rdx, 18
199e4ec6c07: 48 8d 5c 02 01      	lea	rbx, [rdx + rax + 1]
199e4ec6c0c: c5 78 10 05 65 01 00 00	vmovups	xmm8, xmmword ptr [rip + 357]
199e4ec6c14: be 0a 00 00 00      	mov	esi, 10
199e4ec6c19: bf 08 00 00 00      	mov	edi, 8
199e4ec6c1e: 03 ff               	add	edi, edi
199e4ec6c20: e8 44 01 00 00      	call	324
199e4ec6c25: 48 8b e8            	mov	rbp, rax
199e4ec6c28: 85 ff               	test	edi, edi
199e4ec6c2a: 7e 0d               	jle	13
199e4ec6c2c: 44 8b f7            	mov	r14d, edi
199e4ec6c2f: e8 35 01 00 00      	call	309
199e4ec6c34: 41 ff ce            	dec	r14d
199e4ec6c37: 75 f6               	jne	-10
199e4ec6c39: e8 2b 01 00 00      	call	299
199e4ec6c3e: 48 2b c5            	sub	rax, rbp
199e4ec6c41: 48 8b e8            	mov	rbp, rax
199e4ec6c44: 48 3b eb            	cmp	rbp, rbx
199e4ec6c47: 7c d5               	jl	-43
199e4ec6c49: c5 f8 57 c0         	vxorps	xmm0, xmm0, xmm0
199e4ec6c4d: c4 e1 fb 2a c5      	vcvtsi2sd	xmm0, xmm0, rbp
199e4ec6c52: c5 f0 57 c9         	vxorps	xmm1, xmm1, xmm1
199e4ec6c56: c5 f3 2a cf         	vcvtsi2sd	xmm1, xmm1, edi
199e4ec6c5a: c5 fb 5e c1         	vdivsd	xmm0, xmm0, xmm1
199e4ec6c5e: 62 f3 c5 08 51 c8 04	vrangesd	xmm1, xmm7, xmm0, 4
199e4ec6c65: 62 d3 fd 08 55 f8 00	vfixupimmsd	xmm7, xmm0, xmm8, 0
199e4ec6c6c: 62 d3 c5 08 55 c8 00	vfixupimmsd	xmm1, xmm7, xmm8, 0
199e4ec6c73: c5 f8 28 f9         	vmovaps	xmm7, xmm1
199e4ec6c77: 8b c7               	mov	eax, edi
199e4ec6c79: c1 f8 1f            	sar	eax, 31
199e4ec6c7c: 83 e0 03            	and	eax, 3
199e4ec6c7f: 03 c7               	add	eax, edi
199e4ec6c81: 8b f8               	mov	edi, eax
199e4ec6c83: c1 ff 02            	sar	edi, 2
199e4ec6c86: 03 ff               	add	edi, edi
199e4ec6c88: e8 dc 00 00 00      	call	220
199e4ec6c8d: 48 8b e8            	mov	rbp, rax
199e4ec6c90: 85 ff               	test	edi, edi
199e4ec6c92: 7e 0d               	jle	13
199e4ec6c94: 44 8b f7            	mov	r14d, edi
199e4ec6c97: e8 cd 00 00 00      	call	205
199e4ec6c9c: 41 ff ce            	dec	r14d
199e4ec6c9f: 75 f6               	jne	-10
199e4ec6ca1: e8 c3 00 00 00      	call	195
199e4ec6ca6: 48 2b c5            	sub	rax, rbp
199e4ec6ca9: 48 8b e8            	mov	rbp, rax
199e4ec6cac: 48 3b eb            	cmp	rbp, rbx
199e4ec6caf: 7c d5               	jl	-43
199e4ec6cb1: c5 f8 57 c0         	vxorps	xmm0, xmm0, xmm0
ERROR: Decode Failure Left@ offset 199e4ec6cf3
199e4ec6cb5: c4 e1 fb 2a c5      	vcvtsi2sd	xmm0, xmm0, rbp
199e4ec6cba: c5 f0 57 c9         	vxorps	xmm1, xmm1, xmm1
199e4ec6cbe: c5 f3 2a cf         	vcvtsi2sd	xmm1, xmm1, edi
199e4ec6cc2: c5 fb 5e c1         	vdivsd	xmm0, xmm0, xmm1
199e4ec6cc6: 62 f3 cd 08 51 c8 04	vrangesd	xmm1, xmm6, xmm0, 4
199e4ec6ccd: 62 d3 fd 08 55 f0 00	vfixupimmsd	xmm6, xmm0, xmm8, 0
199e4ec6cd4: 62 d3 cd 08 55 c8 00	vfixupimmsd	xmm1, xmm6, xmm8, 0
199e4ec6cdb: c5 f8 28 f1         	vmovaps	xmm6, xmm1
199e4ec6cdf: ff ce               	dec	esi
199e4ec6ce1: 0f 85 32 ff ff ff   	jne	-206
199e4ec6ce7: c5 c3 59 05 9a 00 00 00	vmulsd	xmm0, xmm7, qword ptr [rip + 154]
199e4ec6cef: c5 fb 5e c6         	vdivsd	xmm0, xmm0, xmm6
-----------------------------------------------
-----------------------------------------------
Block:   Right
Size:    457
Address: 199e4eb7040
CodePtr: 199e6a61904
-----------------------------------------------
199e4eb7040: 41 56               	push	r14
199e4eb7042: 57                  	push	rdi
199e4eb7043: 56                  	push	rsi
199e4eb7044: 55                  	push	rbp
199e4eb7045: 53                  	push	rbx
199e4eb7046: 48 83 ec 50         	sub	rsp, 80
199e4eb704a: c5 f8 29 74 24 40   	vmovaps	xmmword ptr [rsp + 64], xmm6
199e4eb7050: c5 f8 29 7c 24 30   	vmovaps	xmmword ptr [rsp + 48], xmm7
199e4eb7056: c5 78 29 44 24 20   	vmovaps	xmmword ptr [rsp + 32], xmm8
199e4eb705c: c5 fb 10 35 a5 01 00 00	vmovsd	xmm6, qword ptr [rip + 421]
199e4eb7064: c5 f8 28 fe         	vmovaps	xmm7, xmm6
199e4eb7068: e8 9c 01 00 00      	call	412
199e4eb706d: e8 97 01 00 00      	call	407
199e4eb7072: 85 c0               	test	eax, eax
199e4eb7074: 0f 8c 5c 01 00 00   	jl	348
199e4eb707a: 48 8d 1d 88 7e 42 00	lea	rbx, [rip + 4357768]
199e4eb7081: 48 83 7b f8 00      	cmp	qword ptr [rbx - 8], 0
199e4eb7086: 0f 85 73 01 00 00   	jne	371
199e4eb708c: 48 ba db 34 b6 d7 82 de 1b 43	movabs	rdx, 4835703278458516699
199e4eb7096: 48 8b c2            	mov	rax, rdx
199e4eb7099: 48 f7 2b            	imul	qword ptr [rbx]
199e4eb709c: 48 8b c2            	mov	rax, rdx
199e4eb709f: 48 c1 e8 3f         	shr	rax, 63
199e4eb70a3: 48 c1 fa 12         	sar	rdx, 18
199e4eb70a7: 48 8d 5c 02 01      	lea	rbx, [rdx + rax + 1]
199e4eb70ac: c5 78 10 05 65 01 00 00	vmovups	xmm8, xmmword ptr [rip + 357]
199e4eb70b4: be 0a 00 00 00      	mov	esi, 10
199e4eb70b9: bf 08 00 00 00      	mov	edi, 8
199e4eb70be: 03 ff               	add	edi, edi
199e4eb70c0: e8 44 01 00 00      	call	324
199e4eb70c5: 48 8b e8            	mov	rbp, rax
199e4eb70c8: 85 ff               	test	edi, edi
199e4eb70ca: 7e 0d               	jle	13
199e4eb70cc: 44 8b f7            	mov	r14d, edi
199e4eb70cf: e8 35 01 00 00      	call	309
199e4eb70d4: 41 ff ce            	dec	r14d
199e4eb70d7: 75 f6               	jne	-10
199e4eb70d9: e8 2b 01 00 00      	call	299
199e4eb70de: 48 2b c5            	sub	rax, rbp
199e4eb70e1: 48 8b e8            	mov	rbp, rax
199e4eb70e4: 48 3b eb            	cmp	rbp, rbx
199e4eb70e7: 7c d5               	jl	-43
199e4eb70e9: c5 f8 57 c0         	vxorps	xmm0, xmm0, xmm0
199e4eb70ed: c4 e1 fb 2a c5      	vcvtsi2sd	xmm0, xmm0, rbp
199e4eb70f2: c5 f0 57 c9         	vxorps	xmm1, xmm1, xmm1
199e4eb70f6: c5 f3 2a cf         	vcvtsi2sd	xmm1, xmm1, edi
199e4eb70fa: c5 fb 5e c1         	vdivsd	xmm0, xmm0, xmm1
199e4eb70fe: 62 f3 c5 08 51 c8 04	vrangesd	xmm1, xmm7, xmm0, 4
199e4eb7105: 62 d3 fd 08 55 f8 00	vfixupimmsd	xmm7, xmm0, xmm8, 0
199e4eb710c: 62 d3 c5 08 55 c8 00	vfixupimmsd	xmm1, xmm7, xmm8, 0
199e4eb7113: c5 f8 28 f9         	vmovaps	xmm7, xmm1
199e4eb7117: 8b c7               	mov	eax, edi
199e4eb7119: c1 f8 1f            	sar	eax, 31
199e4eb711c: 83 e0 03            	and	eax, 3
199e4eb711f: 03 c7               	add	eax, edi
199e4eb7121: 8b f8               	mov	edi, eax
199e4eb7123: c1 ff 02            	sar	edi, 2
199e4eb7126: 03 ff               	add	edi, edi
199e4eb7128: e8 dc 00 00 00      	call	220
199e4eb712d: 48 8b e8            	mov	rbp, rax
199e4eb7130: 85 ff               	test	edi, edi
199e4eb7132: 7e 0d               	jle	13
ERROR: Decode Failure Right@ offset 199e4eb7193
199e4eb7134: 44 8b f7            	mov	r14d, edi
199e4eb7137: e8 cd 00 00 00      	call	205
199e4eb713c: 41 ff ce            	dec	r14d
199e4eb713f: 75 f6               	jne	-10
199e4eb7141: e8 c3 00 00 00      	call	195
199e4eb7146: 48 2b c5            	sub	rax, rbp
199e4eb7149: 48 8b e8            	mov	rbp, rax
199e4eb714c: 48 3b eb            	cmp	rbp, rbx
199e4eb714f: 7c d5               	jl	-43
199e4eb7151: c5 f8 57 c0         	vxorps	xmm0, xmm0, xmm0
199e4eb7155: c4 e1 fb 2a c5      	vcvtsi2sd	xmm0, xmm0, rbp
199e4eb715a: c5 f0 57 c9         	vxorps	xmm1, xmm1, xmm1
199e4eb715e: c5 f3 2a cf         	vcvtsi2sd	xmm1, xmm1, edi
199e4eb7162: c5 fb 5e c1         	vdivsd	xmm0, xmm0, xmm1
199e4eb7166: 62 f3 cd 08 51 c8 04	vrangesd	xmm1, xmm6, xmm0, 4
199e4eb716d: 62 d3 fd 08 55 f0 00	vfixupimmsd	xmm6, xmm0, xmm8, 0
199e4eb7174: 62 d3 cd 08 55 c8 00	vfixupimmsd	xmm1, xmm6, xmm8, 0
199e4eb717b: c5 f8 28 f1         	vmovaps	xmm6, xmm1
199e4eb717f: ff ce               	dec	esi
199e4eb7181: 0f 85 32 ff ff ff   	jne	-206
199e4eb7187: c5 c3 59 05 9a 00 00 00	vmulsd	xmm0, xmm7, qword ptr [rip + 154]
199e4eb718f: c5 fb 5e c6         	vdivsd	xmm0, xmm0, xmm6
-----------------------------------------------
ISSUE: <ASM_DIFF> main method 9439 of size 261 differs
Loaded 1  Jitted 1  FailedCompile 0 Excluded 0 Missing 0 Diffs 1
Total time: 20.784500ms

The problem looks to be that coredistools fails to disassemble

       62F57F086DC0         vcvttsd2sis eax, xmm0

@tannergooding is this a newly emitted instruction, and are we emitting the right bytes here? If so we probably need to update coredistools.

@tannergooding
Copy link
Member

That's an AVX10.2 instruction and the bytes look correct.

But I'm confused as to why NAOT is emitting it in the first place as AVX10.2 is off by default for the JIT and we presumably shouldn't have NAOT tests targeting "all possible ISAs"

@MichalStrehovsky
Copy link
Member

But I'm confused as to why NAOT is emitting it in the first place as AVX10.2 is off by default for the JIT and we presumably shouldn't have NAOT tests targeting "all possible ISAs"

We have a couple tests that enable extra ISAs under src\tests\nativeaot\SmokeTests\HardwareIntrinsics. This one looks to be:

; Emitting BLENDED_CODE for X64 with AVX10.2/512 - Windows

So maybe it's the test that enables --instruction-set:avx512f,avx512f_vl,avx512bw,avx512bw_vl,avx512cd,avx512cd_vl,avx512dq,avx512dq_vl?

@jakobbotsch
Copy link
Member Author

The problem is that avx10v2 is optimistically enabled any time avx512f is enabled:

optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v1");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v1_v512");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("vpclmul_v512");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avx10v2_v512");

Not sure if that's the right behavior or not.

@tannergooding
Copy link
Member

It could be if there's a bug. However, that shouldn't be causing AVX10.2/512 as "supported" under the intended design.

That is given the --instruction-set specified there, the NAOT VM should say that AVX512F is supported and that AVX10.2 is optimisticallySupported (https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/InstructionSetHelpers.cs#L233).

The VM should then pass both the supported and optimistic set down to the JIT: https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs#L4203, for which the JIT should track as the "supported" set: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/compiler.cpp#L2282

compExactlyDependsOn and compOpportunisticallyDependsOn then do roughly the same thing: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/compiler.h#L9843-L9874. The difference is that compExactlyDependsOn always reports the usage while compOpportunisticallyDependsOn does not. These are the two meant to be used by the JIT for its own internal usages. We then have a third compHWIntrinsicDependsOn which is used by the JIT for user hardware intrinsic usage (as in from normal C# code).

We have this split because the optimistic set is meant to be allowed under NAOT under the premise the user has correctly emitted a containing if (Isa.IsSupported) { } guard check (hence we report usage, but then simply return whether its in the list of ISAs). While for the JIT we can emit no such guard and so want exactlyDependsOn to report true but opportunisticallyDependsOn to report false. This difference is meant to be surfaced by the notifyInstructionSetUsage(isa, (opts.compSupportsISA.HasInstructionSet(isa))) call: https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs#L4460

For NAOT this should simply be querying IsInstructionSetSupported which should only include the actual baseline ISAs and not any optimistic ISAs: https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/Compiler/InstructionSetSupport.cs#L44-L52

@jakobbotsch
Copy link
Member Author

For NAOT this should simply be querying IsInstructionSetSupported which should only include the actual baseline ISAs and not any optimistic ISAs: https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/Compiler/InstructionSetSupport.cs#L44-L52

Then this just looks like a bug in SPMI's implementation of notifyInstructionSetUsage here:

bool MyICJI::notifyInstructionSetUsage(CORINFO_InstructionSet instructionSet, bool supported)
{
jitInstance->mc->cr->AddCall("notifyInstructionSetUsage");
return supported;
}

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-NativeAOT-coreclr in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants