You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
abc7057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
412833
ns412520.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
324917
ns323042
ns1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
322791
ns323583
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
741270.5
ns752166.5
ns0.99
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
44918
ns44168
ns1.02
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1358250
ns1384083
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2444062.5
ns2451854
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
14162791
ns14238812.5
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2277500
ns2239125
ns1.02
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
212604
ns210250
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1450562.5
ns1411875
ns1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
960958.5
ns897520.5
ns1.07
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1778125
ns1516292
ns1.17
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2274000
ns2210229
ns1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1767833.5
ns1725583
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1083978.5
ns1017708.5
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1529021
ns1538333
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2954750
ns3006583
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209644
ns210559
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12148854.5
ns12112667
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8834958.5
ns8809666.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9230875
ns9192709
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18631937.5
ns18570834
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1509941
ns1504910
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17314333
ns17273542
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13961542
ns13992292
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14514291
ns14538625
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21865437.5
ns21824875
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249016958.5
ns249443729
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148521291
ns148456250
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116073791
ns115795563
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447568292
ns454024458
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5499808
ns5474002
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1227795916
ns1144391209
ns1.07
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
931180042
ns981113333
ns0.95
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
831332521
ns853440021
ns0.97
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1629694167
ns1805007208
ns0.90
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31376705.5
ns31357343
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1167771625
ns1034466750
ns1.13
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1003953563
ns1009660729.5
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1322017146
ns1324456604
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1730835103.5
ns1728354792
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1100791
ns1093583
ns1.01
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1624625
ns1583083
ns1.03
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3431229
ns3678000
ns0.93
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
781521
ns779625
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
272287.5
ns273068.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
3015146
ns2985458.5
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4087333.5
ns4106125
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
10933000
ns10555937
ns1.04
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3238167
ns3131667
ns1.03
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1132885
ns1134574.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2306750
ns2275083
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1433208.5
ns1429583
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1678625.5
ns1656125
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4201375
ns4200438
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209995
ns210634
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19417729
ns19375958
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16114625
ns16086292
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17220375
ns17180583
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25992250
ns25782875
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1600144
ns1606705
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34149500
ns34182625
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30894937.5
ns30811875
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31140666
ns31108104
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36754250
ns36403791
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4526959
ns4540667
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2746459
ns2769500.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2911584
ns2921250
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8399583
ns8391917
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
373956
ns423308
ns0.88
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38745459
ns39022250
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32111709
ns32067021
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32268625
ns32250916.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
52066792
ns51820375
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2635152.5
ns2657162.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
88780729
ns88606874.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
84997250
ns113796125
ns0.75
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
218329542
ns223648041
ns0.98
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74358917
ns74335583.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
267246875
ns267029417
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
158965875
ns158942229.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
126688521
ns126886229
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485596792
ns487631541
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7022210
ns6889435
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1468898146
ns1474300812.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1171204459
ns1174433750
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1068921333.5
ns1063095500
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2001229479
ns2007751479
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34725068.5
ns34685949
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1692415625
ns1689349708
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1500720958.5
ns1535787500
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1766379833
ns1814518792
ns0.97
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2224153125
ns2211056708.5
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1760875
ns2089187.5
ns0.84
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2595167
ns2976458
ns0.87
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
7433916.5
ns7304583
ns1.02
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2426041.5
ns2476917
ns0.98
lenet(28, 28, 1, 128)/forward/GPU/CUDA
273792
ns272072.5
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9254417
ns9643854
ns0.96
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
11474333
ns12014792
ns0.96
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
25126166
ns25647896
ns0.98
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11780750
ns11736104
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1194908
ns1173736.5
ns1.02
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
381207125
ns380778209
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
285815709
ns282717792
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
233745708
ns238251708.5
ns0.98
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453344667
ns453270208
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4852271
ns4856475
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1157427583
ns1156978917
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
931406250
ns919622250
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
929761209
ns945107000
ns0.98
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1403593291
ns1428489000
ns0.98
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
19807136
ns17978082
ns1.10
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1051042
ns1021959
ns1.03
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1930834
ns2001250
ns0.96
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
4821271
ns6008000
ns0.80
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1297541
ns1374000
ns0.94
lenet(28, 28, 1, 64)/forward/GPU/CUDA
269906
ns268964
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6495729
ns6414395.5
ns1.01
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12306583.5
ns12403896
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
18165416.5
ns20716333
ns0.88
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6025750
ns6079792
ns0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1207681.5
ns1209955
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70586437.5
ns70501749.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43556333.5
ns43580771
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39526083
ns39491375
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132710667
ns132802458.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1944845
ns1859689
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356816354
ns384818104
ns0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270253083
ns295632667
ns0.91
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254146791.5
ns281694167
ns0.90
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534914958.5
ns534727063
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12308008
ns12284399.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
396010084
ns396068167
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
407805500
ns409321729.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
706921292
ns678917958
ns1.04
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
711811750
ns711312959
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1187507791
ns1190798042
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
764568937.5
ns688321229
ns1.11
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
631341166
ns630150084
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1772828250
ns1776546083
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12544942.5
ns12315985
ns1.02
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3767262229
ns3607588771
ns1.04
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2869944333
ns2756374750
ns1.04
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2705287250
ns2714951667
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5058993459
ns4951023834
ns1.02
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49891272
ns49373771
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3429042
ns3429083.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2081583
ns2066792
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2543583
ns2527666
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6024375
ns6016750
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
338827
ns311191
ns1.09
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26104562.5
ns25518541
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19078958.5
ns18527417
ns1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19625020.5
ns18707833
ns1.05
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39317959
ns38890083
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2462668
ns2479107
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54777416
ns54171458
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
80697167
ns78979625
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170440292
ns171331479
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45420250
ns45540167
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1787458
ns1785458
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1101875
ns1046062.5
ns1.05
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1569708
ns1583208.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3035500
ns3024416.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
215425
ns213982
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12537208
ns12521375
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9283500
ns9184167
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9641937.5
ns9599958.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18984166.5
ns18940458
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1531405
ns1538264
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17668583
ns17640750
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14332291.5
ns14307771
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14569250
ns14507583
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22181083.5
ns22177500
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70579000.5
ns70512937
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43509167
ns43444479.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39545292
ns39626750
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132823604.5
ns132598874.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1947535
ns1950639
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
361581166
ns359565417
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
345861541.5
ns293550333
ns1.18
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
303584333
ns287837104.5
ns1.05
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
724116959
ns622550708.5
ns1.16
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13351785.5
ns13384881.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
419705187.5
ns419108729
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
420514459
ns424758959
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
697427687
ns717519375
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
717027625
ns716499833
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1700896
ns1521229
ns1.12
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1344562.5
ns1235833
ns1.09
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1353750
ns1246625
ns1.09
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2400417
ns2300875
ns1.04
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
590707
ns587061.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8924250
ns8812333
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12992208
ns12926416
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
30772062.5
ns30195584
ns1.02
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9884229.5
ns9787000
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1479651
ns1419851.5
ns1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17441145.5
ns18056125
ns0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
16807333
ns16803125
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
30461791.5
ns29287584
ns1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14317375
ns14378083
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
789375
ns805145.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
595083.5
ns589041.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1038125
ns1034812.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725167
ns726750
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48555.5
ns47938.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1507084
ns1542875
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1043292
ns1000270.5
ns1.04
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1413583
ns1504041
ns0.94
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2256583
ns2294104
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
241345.5
ns236494.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1541063
ns1722687.5
ns0.89
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1073583.5
ns1250438
ns0.86
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1495667
ns1858854.5
ns0.80
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2216500
ns2311917
ns0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3407458.5
ns3404416
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2060208
ns2046208
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2504792
ns2516916.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6019500
ns6013625
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
283414
ns285181.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24068584
ns24021312.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17256458.5
ns17217833
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17166250
ns17101666.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37584937.5
ns37551396
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2397302
ns2407620
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52933521
ns52545812.5
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
83805875
ns80522312.5
ns1.04
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
168151312.5
ns166982250.5
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44568645.5
ns44529604
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250376958
ns250184208.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148122999.5
ns147977833
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115699917
ns115557083.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448012646
ns447150583.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5442645
ns5457630
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1105356584
ns1128644583
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
854303812.5
ns881731833.5
ns0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
826724000
ns805115667
ns1.03
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1752988167
ns1757118042
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28762466
ns28927493
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1031896104
ns1058828646
ns0.97
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
962579167
ns973248125
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1179808792
ns1362518583
ns0.87
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1752419187.5
ns1744326604
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1246312
ns1317667
ns0.95
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
981667
ns936250
ns1.05
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
924938
ns907396
ns1.02
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1952875
ns2059708
ns0.95
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
559173.5
ns573972.5
ns0.97
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5968250
ns5872667
ns1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6725083
ns6537417
ns1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
24147709
ns24586229.5
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7125208
ns7039792
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1363102
ns1375117
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
10592083.5
ns11464417
ns0.92
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
9872770.5
ns10266333
ns0.96
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
16891792
ns17693667
ns0.95
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8542250.5
ns8866896
ns0.96
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
490083
ns487208
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
414250
ns474584
ns0.87
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
1848916.5
ns2175853.5
ns0.85
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
89417
ns87541
ns1.02
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27713
ns28408
ns0.98
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
381875
ns383437.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
447500
ns444333.5
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4415146
ns4385583
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
259083.5
ns268292
ns0.97
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
221456.5
ns225901
ns0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
412875
ns706959
ns0.58
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
474250
ns722500
ns0.66
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4220333
ns1069791
ns3.95
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271166
ns447125
ns0.61
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
434854
ns432125
ns1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
353250
ns418166
ns0.84
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
650792
ns742500
ns0.88
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
54375
ns53208
ns1.02
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
27922
ns28501
ns0.98
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
339896.5
ns338770.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
340500
ns338750
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
611187.5
ns737375
ns0.83
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
152292
ns154208
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
206825
ns210566
ns0.98
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
356792
ns404125
ns0.88
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
355875
ns405916.5
ns0.88
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
420542
ns983208
ns0.43
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151000
ns174750
ns0.86
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
603607250
ns603527917
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
425272979
ns431057458.5
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
372455458
ns375361437.5
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
873099458
ns872552854
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7619709
ns7040620
ns1.08
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2006739833.5
ns1986550813
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1613467771
ns1668902250
ns0.97
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1601604000
ns1651138625
ns0.97
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2628483083
ns2764176416
ns0.95
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26335134
ns25979788.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
520146
ns521833
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
434479
ns437250
ns0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1898520.5
ns1710708
ns1.11
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
866625
ns866062.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47286
ns47823
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1848208.5
ns1842562.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2786229
ns2356875
ns1.18
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14679500
ns14345020.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2771958
ns2764166
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
249296.5
ns252466.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1937125
ns2751750
ns0.70
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5035312.5
ns2316083
ns2.17
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
14724291.5
ns4360708
ns3.38
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2768167
ns4727708
ns0.59
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1574791.5
ns1581500
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1257666
ns1216229.5
ns1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1200500
ns1177645.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2226083
ns2314729
ns0.96
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
584985.5
ns547137
ns1.07
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5976500
ns5877292
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
4604667
ns6745916.5
ns0.68
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
25216125
ns24550687.5
ns1.03
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7317042
ns7266312
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1363255
ns1351645
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
12710625
ns12285333.5
ns1.03
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
11988958
ns12037124.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
21409084
ns20466187
ns1.05
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10882083
ns10853417
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2291
ns2500
ns0.92
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2708
ns2750
ns0.98
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
2959
ns3416
ns0.87
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2375
ns3041
ns0.78
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24451.5
ns24989
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7042
ns8333
ns0.85
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7084
ns8625
ns0.82
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7209
ns8667
ns0.83
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7166
ns8770.5
ns0.82
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
210193.5
ns213236.5
ns0.99
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8125
ns16750
ns0.49
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8292
ns16375
ns0.51
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8208
ns16792
ns0.49
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5917
ns10917
ns0.54
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
11000.5
ns10792
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
16166
ns18083
ns0.89
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
11146
ns11666
ns0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7125
ns7666.5
ns0.93
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24717
ns24865.5
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
20000
ns22333
ns0.90
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
20000
ns22291
ns0.90
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20125
ns22500
ns0.89
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
20250
ns22375
ns0.91
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
230632.5
ns233562.5
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23375
ns52042
ns0.45
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23417
ns52125
ns0.45
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23645.5
ns52270.5
ns0.45
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21375
ns44000
ns0.49
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
29458
ns28979.5
ns1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28834
ns29208
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28625
ns28458
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46333
ns46209
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25821.5
ns26274
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
226542
ns229062.5
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
274167
ns263041
ns1.04
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4023229.5
ns4056646
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145708
ns154437.5
ns0.94
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
205677
ns215509
ns0.95
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
339625
ns329834
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
311625
ns292583
ns1.07
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
520417
ns817500
ns0.64
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
161292
ns161708
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1875
ns2041
ns0.92
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1833
ns1833
ns1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2104.5
ns2750
ns0.77
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1625
ns1917
ns0.85
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
22965
ns23258
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5250
ns7208
ns0.73
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5250
ns7042
ns0.75
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5292
ns7750
ns0.68
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5208
ns7125
ns0.73
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
261526
ns267733.5
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11208
ns11334
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11333
ns11375
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11459
ns11708
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6708
ns6958
ns0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79891416
ns79930209
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49038584
ns49066500
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
44836791
ns45049708
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151572917
ns151430167
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2695899
ns2719840
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
665802334
ns497512959
ns1.34
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
410890125
ns411297375
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
399102167
ns396546125
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
681784916
ns736651313
ns0.93
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14619713
ns14587409
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
710708249.5
ns709337374.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
671159083
ns664763792
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
978285458
ns1022853709
ns0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
996959708
ns996468292
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.
abc7057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
abc7057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/114718
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via: