-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: optimize the avx2 validator (shaving one SIMD instruction). #36
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I also took the time to do benchmarks for fun:
| Method | FileName | Mean | Error | StdDev | Speed (GB/s) |
|------------------------------------ |------------------------------ |-----------:|------------:|-----------:|------------- |
| DotnetRuntimeUtf8ValidationRealData | data/Arabic-Lipsum.utf8.txt | 26.885 us | 13.4045 us | 0.7347 us | 3.04 |
| DotnetRuntimeUtf8ValidationRealData | data/Chinese-Lipsum.utf8.txt | 10.927 us | 0.4240 us | 0.0232 us | 6.39 |
| DotnetRuntimeUtf8ValidationRealData | data/Emoji-Lipsum.utf8.txt | 41.696 us | 9.4892 us | 0.5201 us | 1.57 |
| DotnetRuntimeUtf8ValidationRealData | data/Hebrew-Lipsum.utf8.txt | 22.147 us | 3.8862 us | 0.2130 us | 3.00 |
| DotnetRuntimeUtf8ValidationRealData | data/Hindi-Lipsum.utf8.txt | 23.644 us | 2.4712 us | 0.1355 us | 3.72 |
| DotnetRuntimeUtf8ValidationRealData | data/Japanese-Lipsum.utf8.txt | 11.440 us | 3.4394 us | 0.1885 us | 5.93 |
| DotnetRuntimeUtf8ValidationRealData | data/Korean-Lipsum.utf8.txt | 27.172 us | 0.2230 us | 0.0122 us | 2.45 |
| DotnetRuntimeUtf8ValidationRealData | data/Latin-Lipsum.utf8.txt | 1.238 us | 0.0687 us | 0.0038 us | 70.24 |
| DotnetRuntimeUtf8ValidationRealData | data/Russian-Lipsum.utf8.txt | 36.095 us | 2.5588 us | 0.1403 us | 2.90 |
| DotnetRuntimeUtf8ValidationRealData | data/arabic.utf8.txt | 202.607 us | 48.7562 us | 2.6725 us | 2.63 |
| DotnetRuntimeUtf8ValidationRealData | data/chinese.utf8.txt | 29.398 us | 0.7938 us | 0.0435 us | 6.17 |
| DotnetRuntimeUtf8ValidationRealData | data/czech.utf8.txt | 22.324 us | 0.5147 us | 0.0282 us | 6.84 |
| DotnetRuntimeUtf8ValidationRealData | data/english.utf8.txt | 15.265 us | 0.3208 us | 0.0176 us | 25.57 |
| DotnetRuntimeUtf8ValidationRealData | data/esperanto.utf8.txt | 6.422 us | 1.4525 us | 0.0796 us | 13.54 |
| DotnetRuntimeUtf8ValidationRealData | data/french.utf8.txt | 73.019 us | 13.9061 us | 0.7622 us | 6.12 |
| DotnetRuntimeUtf8ValidationRealData | data/german.utf8.txt | 14.579 us | 5.5486 us | 0.3041 us | 14.11 |
| DotnetRuntimeUtf8ValidationRealData | data/greek.utf8.txt | 29.862 us | 0.4680 us | 0.0257 us | 6.07 |
| DotnetRuntimeUtf8ValidationRealData | data/hebrew.utf8.txt | 76.436 us | 29.9407 us | 1.6412 us | 2.49 |
| DotnetRuntimeUtf8ValidationRealData | data/hindi.utf8.txt | 145.619 us | 20.9771 us | 1.1498 us | 2.72 |
| DotnetRuntimeUtf8ValidationRealData | data/japanese.utf8.txt | 26.449 us | 3.2254 us | 0.1768 us | 6.21 |
| DotnetRuntimeUtf8ValidationRealData | data/korean.utf8.txt | 22.260 us | 1.3454 us | 0.0737 us | 4.40 |
| DotnetRuntimeUtf8ValidationRealData | data/persan.utf8.txt | 28.774 us | 4.5707 us | 0.2505 us | 5.43 |
| DotnetRuntimeUtf8ValidationRealData | data/portuguese.utf8.txt | 23.078 us | 0.9856 us | 0.0540 us | 12.16 |
| DotnetRuntimeUtf8ValidationRealData | data/russian.utf8.txt | 164.630 us | 14.0242 us | 0.7687 us | 2.47 |
| DotnetRuntimeUtf8ValidationRealData | data/thai.utf8.txt | 126.441 us | 40.6475 us | 2.2280 us | 4.69 |
| DotnetRuntimeUtf8ValidationRealData | data/turkish.utf8.txt | 25.742 us | 7.7865 us | 0.4268 us | 7.58 |
| DotnetRuntimeUtf8ValidationRealData | data/vietnamese.utf8.txt | 172.919 us | 46.6865 us | 2.5590 us | 1.84 |
Our scalar:
| Utf8ValidationRealDataScalar | data/Arabic-Lipsum.utf8.txt | 40.872 us | 2.1597 us | 0.1184 us | 2.00 |
| Utf8ValidationRealDataScalar | data/Chinese-Lipsum.utf8.txt | 38.581 us | 7.9837 us | 0.4376 us | 1.81 |
| Utf8ValidationRealDataScalar | data/Emoji-Lipsum.utf8.txt | 37.589 us | 8.1082 us | 0.4444 us | 1.74 |
| Utf8ValidationRealDataScalar | data/Hebrew-Lipsum.utf8.txt | 33.380 us | 2.1797 us | 0.1195 us | 1.99 |
| Utf8ValidationRealDataScalar | data/Hindi-Lipsum.utf8.txt | 71.692 us | 4.7877 us | 0.2624 us | 1.23 |
| Utf8ValidationRealDataScalar | data/Japanese-Lipsum.utf8.txt | 41.460 us | 5.6288 us | 0.3085 us | 1.64 |
| Utf8ValidationRealDataScalar | data/Korean-Lipsum.utf8.txt | 35.024 us | 28.4632 us | 1.5602 us | 1.90 |
| Utf8ValidationRealDataScalar | data/Latin-Lipsum.utf8.txt | 19.650 us | 0.2710 us | 0.0149 us | 4.42 |
| Utf8ValidationRealDataScalar | data/Russian-Lipsum.utf8.txt | 92.122 us | 60.6248 us | 3.3230 us | 1.14 |
| Utf8ValidationRealDataScalar | data/arabic.utf8.txt | 430.422 us | 164.6237 us | 9.0236 us | 1.24 |
| Utf8ValidationRealDataScalar | data/chinese.utf8.txt | 107.631 us | 76.1260 us | 4.1727 us | 1.68 |
| Utf8ValidationRealDataScalar | data/czech.utf8.txt | 81.389 us | 64.9311 us | 3.5591 us | 1.88 |
| Utf8ValidationRealDataScalar | data/english.utf8.txt | 102.115 us | 63.3080 us | 3.4701 us | 3.82 |
| Utf8ValidationRealDataScalar | data/esperanto.utf8.txt | 29.780 us | 13.1039 us | 0.7183 us | 2.92 |
| Utf8ValidationRealDataScalar | data/french.utf8.txt | 253.983 us | 61.5314 us | 3.3727 us | 1.76 |
| Utf8ValidationRealDataScalar | data/german.utf8.txt | 66.757 us | 11.7156 us | 0.6422 us | 3.08 |
| Utf8ValidationRealDataScalar | data/greek.utf8.txt | 115.867 us | 136.8344 us | 7.5004 us | 1.57 |
| Utf8ValidationRealDataScalar | data/hebrew.utf8.txt | 145.466 us | 32.1100 us | 1.7601 us | 1.31 |
| Utf8ValidationRealDataScalar | data/hindi.utf8.txt | 255.891 us | 59.9555 us | 3.2864 us | 1.55 |
| Utf8ValidationRealDataScalar | data/japanese.utf8.txt | 90.162 us | 31.4517 us | 1.7240 us | 1.82 |
| Utf8ValidationRealDataScalar | data/korean.utf8.txt | 66.039 us | 14.7216 us | 0.8069 us | 1.48 |
| Utf8ValidationRealDataScalar | data/persan.utf8.txt | 105.572 us | 28.1650 us | 1.5438 us | 1.48 |
| Utf8ValidationRealDataScalar | data/portuguese.utf8.txt | 104.232 us | 34.2278 us | 1.8761 us | 2.69 |
| Utf8ValidationRealDataScalar | data/russian.utf8.txt | 284.178 us | 110.5611 us | 6.0602 us | 1.43 |
| Utf8ValidationRealDataScalar | data/thai.utf8.txt | 298.062 us | 122.5343 us | 6.7165 us | 1.99 |
| Utf8ValidationRealDataScalar | data/turkish.utf8.txt | 129.732 us | 73.9438 us | 4.0531 us | 1.50 |
| Utf8ValidationRealDataScalar | data/vietnamese.utf8.txt | 292.851 us | 61.1760 us | 3.3533 us | 1.09 |
This PR's benchmark
| SIMDUtf8ValidationRealDataSse | data/Arabic-Lipsum.utf8.txt | 6.901 us | 2.7518 us | 0.1508 us | 11.84 |
| SIMDUtf8ValidationRealDataSse | data/Chinese-Lipsum.utf8.txt | 6.000 us | 0.5844 us | 0.0320 us | 11.64 |
| SIMDUtf8ValidationRealDataSse | data/Emoji-Lipsum.utf8.txt | 5.625 us | 2.5828 us | 0.1416 us | 11.65 |
| SIMDUtf8ValidationRealDataSse | data/Hebrew-Lipsum.utf8.txt | 5.752 us | 1.2546 us | 0.0688 us | 11.56 |
| SIMDUtf8ValidationRealDataSse | data/Hindi-Lipsum.utf8.txt | 7.676 us | 2.0566 us | 0.1127 us | 11.46 |
| SIMDUtf8ValidationRealDataSse | data/Japanese-Lipsum.utf8.txt | 5.871 us | 2.0677 us | 0.1133 us | 11.55 |
| SIMDUtf8ValidationRealDataSse | data/Korean-Lipsum.utf8.txt | 5.632 us | 2.4982 us | 0.1369 us | 11.82 |
| SIMDUtf8ValidationRealDataSse | data/Latin-Lipsum.utf8.txt | 1.046 us | 0.0510 us | 0.0028 us | 83.13 |
| SIMDUtf8ValidationRealDataSse | data/Russian-Lipsum.utf8.txt | 8.823 us | 1.7109 us | 0.0938 us | 11.87 |
| SIMDUtf8ValidationRealDataSse | data/arabic.utf8.txt | 57.236 us | 2.4834 us | 0.1361 us | 9.33 |
| SIMDUtf8ValidationRealDataSse | data/chinese.utf8.txt | 11.841 us | 7.2579 us | 0.3978 us | 15.31 |
| SIMDUtf8ValidationRealDataSse | data/czech.utf8.txt | 11.307 us | 0.4361 us | 0.0239 us | 13.51 |
| SIMDUtf8ValidationRealDataSse | data/english.utf8.txt | 24.314 us | 7.5694 us | 0.4149 us | 16.06 |
| SIMDUtf8ValidationRealDataSse | data/esperanto.utf8.txt | 4.759 us | 1.6441 us | 0.0901 us | 18.27 |
| SIMDUtf8ValidationRealDataSse | data/french.utf8.txt | 68.982 us | 8.4139 us | 0.4612 us | 6.48 |
| SIMDUtf8ValidationRealDataSse | data/german.utf8.txt | 11.290 us | 1.4122 us | 0.0774 us | 18.23 |
| SIMDUtf8ValidationRealDataSse | data/greek.utf8.txt | 12.495 us | 2.4793 us | 0.1359 us | 14.51 |
| SIMDUtf8ValidationRealDataSse | data/hebrew.utf8.txt | 12.743 us | 1.2982 us | 0.0712 us | 14.92 |
| SIMDUtf8ValidationRealDataSse | data/hindi.utf8.txt | 38.276 us | 2.6245 us | 0.1439 us | 10.36 |
| SIMDUtf8ValidationRealDataSse | data/japanese.utf8.txt | 10.663 us | 7.5412 us | 0.4134 us | 15.41 |
| SIMDUtf8ValidationRealDataSse | data/korean.utf8.txt | 7.637 us | 1.6013 us | 0.0878 us | 12.81 |
| SIMDUtf8ValidationRealDataSse | data/persan.utf8.txt | 11.430 us | 3.5880 us | 0.1967 us | 13.67 |
| SIMDUtf8ValidationRealDataSse | data/portuguese.utf8.txt | 14.784 us | 0.2450 us | 0.0134 us | 18.98 |
| SIMDUtf8ValidationRealDataSse | data/russian.utf8.txt | 39.885 us | 12.3885 us | 0.6791 us | 10.21 |
| SIMDUtf8ValidationRealDataSse | data/thai.utf8.txt | 55.606 us | 12.6836 us | 0.6952 us | 10.67 |
| SIMDUtf8ValidationRealDataSse | data/turkish.utf8.txt | 11.815 us | 1.0225 us | 0.0560 us | 16.51 |
| SIMDUtf8ValidationRealDataSse | data/vietnamese.utf8.txt | 31.997 us | 21.4611 us | 1.1764 us | 9.97 |
Original PR AVX2:
| SIMDUtf8ValidationRealDataSse | data/Arabic-Lipsum.utf8.txt | 6.905 us | 0.8321 us | 0.0456 us | 11.83 |
| SIMDUtf8ValidationRealDataSse | data/Chinese-Lipsum.utf8.txt | 5.837 us | 0.5159 us | 0.0283 us | 11.97 |
| SIMDUtf8ValidationRealDataSse | data/Emoji-Lipsum.utf8.txt | 5.489 us | 1.2389 us | 0.0679 us | 11.94 |
| SIMDUtf8ValidationRealDataSse | data/Hebrew-Lipsum.utf8.txt | 5.599 us | 1.0649 us | 0.0584 us | 11.88 |
| SIMDUtf8ValidationRealDataSse | data/Hindi-Lipsum.utf8.txt | 7.239 us | 0.3453 us | 0.0189 us | 12.16 |
| SIMDUtf8ValidationRealDataSse | data/Japanese-Lipsum.utf8.txt | 5.970 us | 3.4785 us | 0.1907 us | 11.36 |
| SIMDUtf8ValidationRealDataSse | data/Korean-Lipsum.utf8.txt | 5.541 us | 0.2712 us | 0.0149 us | 12.02 |
| SIMDUtf8ValidationRealDataSse | data/Latin-Lipsum.utf8.txt | 1.038 us | 0.2676 us | 0.0147 us | 83.79 |
| SIMDUtf8ValidationRealDataSse | data/Russian-Lipsum.utf8.txt | 9.470 us | 8.5989 us | 0.4713 us | 11.06 |
| SIMDUtf8ValidationRealDataSse | data/arabic.utf8.txt | 57.405 us | 10.8354 us | 0.5939 us | 9.30 |
| SIMDUtf8ValidationRealDataSse | data/chinese.utf8.txt | 11.701 us | 4.1874 us | 0.2295 us | 15.50 |
| SIMDUtf8ValidationRealDataSse | data/czech.utf8.txt | 11.021 us | 0.7689 us | 0.0421 us | 13.86 |
| SIMDUtf8ValidationRealDataSse | data/english.utf8.txt | 23.418 us | 2.4081 us | 0.1320 us | 16.67 |
| SIMDUtf8ValidationRealDataSse | data/esperanto.utf8.txt | 5.130 us | 2.2320 us | 0.1223 us | 16.95 |
| SIMDUtf8ValidationRealDataSse | data/french.utf8.txt | 67.241 us | 13.6930 us | 0.7506 us | 6.65 |
| SIMDUtf8ValidationRealDataSse | data/german.utf8.txt | 11.259 us | 3.7616 us | 0.2062 us | 18.28 |
| SIMDUtf8ValidationRealDataSse | data/greek.utf8.txt | 11.828 us | 1.5274 us | 0.0837 us | 15.33 |
| SIMDUtf8ValidationRealDataSse | data/hebrew.utf8.txt | 12.547 us | 2.1781 us | 0.1194 us | 15.15 |
| SIMDUtf8ValidationRealDataSse | data/hindi.utf8.txt | 38.366 us | 3.3153 us | 0.1817 us | 10.34 |
| SIMDUtf8ValidationRealDataSse | data/japanese.utf8.txt | 10.354 us | 1.8683 us | 0.1024 us | 15.87 |
| SIMDUtf8ValidationRealDataSse | data/korean.utf8.txt | 7.648 us | 3.4761 us | 0.1905 us | 12.79 |
| SIMDUtf8ValidationRealDataSse | data/persan.utf8.txt | 11.715 us | 3.3335 us | 0.1827 us | 13.33 |
| SIMDUtf8ValidationRealDataSse | data/portuguese.utf8.txt | 15.264 us | 3.5084 us | 0.1923 us | 18.39 |
| SIMDUtf8ValidationRealDataSse | data/russian.utf8.txt | 43.032 us | 1.8628 us | 0.1021 us | 9.46 |
| SIMDUtf8ValidationRealDataSse | data/thai.utf8.txt | 56.710 us | 7.0078 us | 0.3841 us | 10.47 |
| SIMDUtf8ValidationRealDataSse | data/turkish.utf8.txt | 12.031 us | 6.2862 us | 0.3446 us | 16.21 |
| SIMDUtf8ValidationRealDataSse | data/vietnamese.utf8.txt | 30.984 us | 5.7627 us | 0.3159 us | 10.30 |
So overall pretty similar but an instruction saved is an instruction saved:
I vote for merge.
@Nick-Nuon I fully expect that my 'optimization' saves little, as you benchmarking results suggest. Your code was already pretty efficient even though you characterized it as 'slow'. It wasn't slow. I think it is not solely a matter of saving an instruction, but also simplifying the code a bit. :-) Merge away!!! |
@Nick-Nuon I have simplified it further!!! |
@Nick-Nuon I don't think you are benchmarking avx. Are you? |
I am going to merge this and then do some cleaning. |
Shoot ,you are right. I completely forgot that cleaning the AVX2 benchmarks was still a TODO.I should have noted it somewhere in the comments and been more careful |
@Nick-Nuon See #37 I think we need to do a bit more work on debugging but it is getting close. |
No description provided.