Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/chacha20, crypto/poly1305: add MIPSLE assembly version #294

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stffabi
Copy link

@stffabi stffabi commented May 15, 2024

Add assembly optimized versions for ChaCha20 and Poly1305 crypto algorithms for MIPSLE.

The algorithms have been ported from other ASM implementations, both of which are dual licensed under “GPL-2.0 OR MIT”

The following are benchmarks done on a MT7688. It compares the base go implementation with the assembly version, once with a MIPS32r1 IS and once with MIPS32r2 IS.

goos: linux
goarch: mipsle
pkg: golang.org/x/crypto/chacha20
│ old.txt │ asm.txt │ asm-mips32r2.txt │
│ B/s │ B/s vs base │ B/s vs base │
ChaCha20/64 4.015Mi ± 1% 10.376Mi ± 1% +158.43% (p=0.000 n=10) 13.485Mi ± 2% +235.87% (p=0.000 n=10)
ChaCha20/256 4.473Mi ± 1% 12.846Mi ± 1% +187.21% (p=0.000 n=10) 18.859Mi ± 3% +321.64% (p=0.000 n=10)
ChaCha20/10x25 3.119Mi ± 1% 6.104Mi ± 2% +95.72% (p=0.000 n=10) 7.181Mi ± 3% +130.28% (p=0.000 n=10)
ChaCha20/4096 4.659Mi ± 4% 13.609Mi ± 4% +192.12% (p=0.000 n=10) 20.270Mi ± 5% +335.11% (p=0.000 n=10)
ChaCha20/100x40 4.020Mi ± 2% 9.918Mi ± 3% +146.74% (p=0.000 n=10) 13.433Mi ± 5% +234.16% (p=0.000 n=10)
ChaCha20/65536 4.301Mi ± 1% 9.727Mi ± 1% +126.16% (p=0.000 n=10) 12.393Mi ± 0% +188.14% (p=0.000 n=10)
ChaCha20/1000x65 4.187Mi ± 1% 10.076Mi ± 2% +140.66% (p=0.000 n=10) 13.032Mi ± 2% +211.28% (p=0.000 n=10)
geomean 4.082Mi 10.11Mi +147.56% 13.47Mi +229.90%

pkg: golang.org/x/crypto/internal/poly1305
│ old.txt │ asm.txt │ asm-mips32r2.txt │
│ B/s │ B/s vs base │ B/s vs base │
64 5.307Mi ± 0% 21.009Mi ± 0% +295.87% (p=0.000 n=10) 20.938Mi ± 0% +294.52% (p=0.000 n=10)
1K 6.566Mi ± 1% 66.676Mi ± 0% +915.47% (p=0.000 n=10) 66.042Mi ± 0% +905.81% (p=0.000 n=10)
2M 5.140Mi ± 1% 47.135Mi ± 0% +816.98% (p=0.000 n=10) 47.016Mi ± 0% +814.66% (p=0.000 n=10)
64Unaligned 5.322Mi ± 1% 21.024Mi ± 0% +295.07% (p=0.000 n=10) 20.871Mi ± 1% +292.20% (p=0.000 n=10)
1KUnaligned 6.561Mi ± 0% 66.614Mi ± 0% +915.26% (p=0.000 n=10) 66.333Mi ± 0% +910.97% (p=0.000 n=10)
2MUnaligned 5.140Mi ± 1% 47.197Mi ± 1% +818.18% (p=0.000 n=10) 47.126Mi ± 0% +816.79% (p=0.000 n=10)
Write64 6.599Mi ± 0% 57.268Mi ± 0% +767.77% (p=0.000 n=10) 57.368Mi ± 0% +769.29% (p=0.000 n=10)
Write1K 6.819Mi ± 0% 79.408Mi ± 0% +1064.55% (p=0.000 n=10) 79.246Mi ± 0% +1062.17% (p=0.000 n=10)
Write2M 5.140Mi ± 0% 47.169Mi ± 0% +817.63% (p=0.000 n=10) 47.116Mi ± 0% +816.60% (p=0.000 n=10)
Write64Unaligned 6.428Mi ± 3% 56.992Mi ± 1% +786.65% (p=0.000 n=10) 56.424Mi ± 1% +777.82% (p=0.000 n=10)
Write1KUnaligned 6.814Mi ± 2% 79.293Mi ± 0% +1063.68% (p=0.000 n=10) 79.513Mi ± 0% +1066.90% (p=0.000 n=10)
Write2MUnaligned 5.016Mi ± 2% 47.183Mi ± 1% +840.59% (p=0.000 n=10) 47.183Mi ± 0% +840.59% (p=0.000 n=10)
geomean 5.858Mi 49.17Mi +739.29% 49.02Mi +736.70%

pkg: golang.org/x/crypto/chacha20poly1305
│ old.txt │ asm.txt │ asm-mips32r2.txt │
│ B/s │ B/s vs base │ B/s vs base │
Chacha20Poly1305/Open-64 1.230Mi ± 4% 3.042Mi ± 1% +147.29% (p=0.000 n=10) 3.548Mi ± 2% +188.37% (p=0.000 n=10)
Chacha20Poly1305/Seal-64 1.144Mi ± 1% 3.462Mi ± 1% +202.50% (p=0.000 n=10) 3.810Mi ± 1% +232.92% (p=0.000 n=10)
Chacha20Poly1305/Open-64-X 908.2Ki ± 1% 1718.8Ki ± 2% +89.25% (p=0.000 n=10) 1840.8Ki ± 2% +102.69% (p=0.000 n=10)
Chacha20Poly1305/Seal-64-X 839.8Ki ± 1% 1894.5Ki ± 2% +125.58% (p=0.000 n=10) 2006.8Ki ± 2% +138.95% (p=0.000 n=10)
Chacha20Poly1305/Open-1024 2.594Mi ± 3% 9.975Mi ± 1% +284.56% (p=0.000 n=10) 13.208Mi ± 3% +409.19% (p=0.000 n=10)
Chacha20Poly1305/Seal-1024 2.551Mi ± 1% 10.600Mi ± 2% +315.51% (p=0.000 n=10) 14.353Mi ± 3% +462.62% (p=0.000 n=10)
Chacha20Poly1305/Open-1024-X 2.470Mi ± 0% 8.569Mi ± 0% +246.91% (p=0.000 n=10) 10.705Mi ± 2% +333.40% (p=0.000 n=10)
Chacha20Poly1305/Seal-1024-X 2.413Mi ± 1% 9.036Mi ± 1% +274.51% (p=0.000 n=10) 11.330Mi ± 1% +369.57% (p=0.000 n=10)
Chacha20Poly1305/Open-1350 2.594Mi ± 3% 9.899Mi ± 2% +281.62% (p=0.000 n=10) 13.237Mi ± 2% +410.29% (p=0.000 n=10)
Chacha20Poly1305/Seal-1350 2.556Mi ± 1% 10.471Mi ± 1% +309.70% (p=0.000 n=10) 13.452Mi ± 1% +426.31% (p=0.000 n=10)
Chacha20Poly1305/Open-1350-X 2.503Mi ± 2% 8.817Mi ± 1% +252.19% (p=0.000 n=10) 11.382Mi ± 1% +354.67% (p=0.000 n=10)
Chacha20Poly1305/Seal-1350-X 2.460Mi ± 0% 9.093Mi ± 1% +269.57% (p=0.000 n=10) 11.873Mi ± 2% +382.56% (p=0.000 n=10)
Chacha20Poly1305/Open-2048 2.694Mi ± 2% 11.024Mi ± 2% +309.20% (p=0.000 n=10) 14.963Mi ± 1% +455.40% (p=0.000 n=10)
Chacha20Poly1305/Seal-2048 2.699Mi ± 0% 11.477Mi ± 2% +325.27% (p=0.000 n=10) 15.240Mi ± 1% +464.66% (p=0.000 n=10)
Chacha20Poly1305/Open-2048-X 2.637Mi ± 1% 10.056Mi ± 1% +281.37% (p=0.000 n=10) 13.375Mi ± 1% +407.23% (p=0.000 n=10)
Chacha20Poly1305/Seal-2048-X 2.627Mi ± 1% 10.328Mi ± 2% +293.10% (p=0.000 n=10) 13.819Mi ± 2% +425.95% (p=0.000 n=10)
Chacha20Poly1305/Open-4096 2.732Mi ± 5% 11.225Mi ± 4% +310.82% (p=0.000 n=10) 16.041Mi ± 4% +487.09% (p=0.000 n=10)
Chacha20Poly1305/Seal-4096 2.704Mi ± 2% 10.839Mi ± 7% +300.88% (p=0.000 n=10) 15.693Mi ± 7% +480.42% (p=0.000 n=10)
Chacha20Poly1305/Open-4096-X 2.670Mi ± 1% 10.381Mi ± 4% +288.75% (p=0.000 n=10) 15.035Mi ± 4% +463.04% (p=0.000 n=10)
Chacha20Poly1305/Seal-4096-X 2.680Mi ± 1% 10.867Mi ± 5% +305.52% (p=0.000 n=10) 15.421Mi ± 7% +475.44% (p=0.000 n=10)
Chacha20Poly1305/Open-8192 2.708Mi ± 2% 11.053Mi ± 3% +308.10% (p=0.000 n=10) 15.926Mi ± 5% +488.03% (p=0.000 n=10)
Chacha20Poly1305/Seal-8192 2.632Mi ± 4% 10.896Mi ± 6% +313.95% (p=0.000 n=10) 16.031Mi ± 5% +509.06% (p=0.000 n=10)
Chacha20Poly1305/Open-8192-X 2.666Mi ± 4% 10.948Mi ± 4% +310.73% (p=0.000 n=10) 15.855Mi ± 3% +494.81% (p=0.000 n=10)
Chacha20Poly1305/Seal-8192-X 2.637Mi ± 2% 10.805Mi ± 2% +309.76% (p=0.000 n=10) 14.725Mi ± 6% +458.41% (p=0.000 n=10)
Chacha20Poly1305/Open-16384 2.499Mi ± 4% 10.405Mi ± 13% +316.41% (p=0.000 n=10) 13.628Mi ± 7% +445.42% (p=0.000 n=10)
Chacha20Poly1305/Seal-16384 2.484Mi ± 4% 9.069Mi ± 4% +265.07% (p=0.000 n=10) 12.131Mi ± 3% +388.29% (p=0.000 n=10)
Chacha20Poly1305/Open-16384-X 2.389Mi ± 7% 10.028Mi ± 5% +319.76% (p=0.000 n=10) 14.472Mi ± 3% +505.79% (p=0.000 n=10)
Chacha20Poly1305/Seal-16384-X 2.475Mi ± 4% 9.084Mi ± 2% +267.05% (p=0.000 n=10) 12.212Mi ± 6% +393.45% (p=0.000 n=10)
geomean 2.259Mi 8.271Mi +266.21% 10.90Mi +382.79%

Fixes golang/go#39139

Add assembly optimized versions for ChaCha20 and Poly1305
crypto algorithms for MIPSLE.

The algorithms have been ported from other ASM implementations,
both of which are dual licensed under “GPL-2.0 OR MIT”
- https://github.com/torvalds/linux/blob/1b294a1f35616977caddaddf3e9d28e576a1adbc/arch/mips/crypto/chacha-core.S
- https://github.com/WireGuard/wireguard-monolithic-historical/blob/edad0d6e99e5133b1e8e865d727a25fff6399cb4/src/crypto/zinc/poly1305/poly1305-mips.S

The following are benchmarks done on a MT7688. It compares
the base go implementation with the assembly version, once
with a MIPS32r1 IS and once with MIPS32r2 IS.

goos: linux
goarch: mipsle
pkg: golang.org/x/crypto/chacha20
                 │   old.txt    │                asm.txt                 │            asm-mips32r2.txt            │
                 │     B/s      │      B/s       vs base                 │      B/s       vs base                 │
ChaCha20/64        4.015Mi ± 1%   10.376Mi ± 1%  +158.43% (p=0.000 n=10)   13.485Mi ± 2%  +235.87% (p=0.000 n=10)
ChaCha20/256       4.473Mi ± 1%   12.846Mi ± 1%  +187.21% (p=0.000 n=10)   18.859Mi ± 3%  +321.64% (p=0.000 n=10)
ChaCha20/10x25     3.119Mi ± 1%    6.104Mi ± 2%   +95.72% (p=0.000 n=10)    7.181Mi ± 3%  +130.28% (p=0.000 n=10)
ChaCha20/4096      4.659Mi ± 4%   13.609Mi ± 4%  +192.12% (p=0.000 n=10)   20.270Mi ± 5%  +335.11% (p=0.000 n=10)
ChaCha20/100x40    4.020Mi ± 2%    9.918Mi ± 3%  +146.74% (p=0.000 n=10)   13.433Mi ± 5%  +234.16% (p=0.000 n=10)
ChaCha20/65536     4.301Mi ± 1%    9.727Mi ± 1%  +126.16% (p=0.000 n=10)   12.393Mi ± 0%  +188.14% (p=0.000 n=10)
ChaCha20/1000x65   4.187Mi ± 1%   10.076Mi ± 2%  +140.66% (p=0.000 n=10)   13.032Mi ± 2%  +211.28% (p=0.000 n=10)
geomean            4.082Mi         10.11Mi       +147.56%                   13.47Mi       +229.90%

pkg: golang.org/x/crypto/internal/poly1305
                 │   old.txt    │                 asm.txt                 │            asm-mips32r2.txt             │
                 │     B/s      │      B/s       vs base                  │      B/s       vs base                  │
64                 5.307Mi ± 0%   21.009Mi ± 0%   +295.87% (p=0.000 n=10)   20.938Mi ± 0%   +294.52% (p=0.000 n=10)
1K                 6.566Mi ± 1%   66.676Mi ± 0%   +915.47% (p=0.000 n=10)   66.042Mi ± 0%   +905.81% (p=0.000 n=10)
2M                 5.140Mi ± 1%   47.135Mi ± 0%   +816.98% (p=0.000 n=10)   47.016Mi ± 0%   +814.66% (p=0.000 n=10)
64Unaligned        5.322Mi ± 1%   21.024Mi ± 0%   +295.07% (p=0.000 n=10)   20.871Mi ± 1%   +292.20% (p=0.000 n=10)
1KUnaligned        6.561Mi ± 0%   66.614Mi ± 0%   +915.26% (p=0.000 n=10)   66.333Mi ± 0%   +910.97% (p=0.000 n=10)
2MUnaligned        5.140Mi ± 1%   47.197Mi ± 1%   +818.18% (p=0.000 n=10)   47.126Mi ± 0%   +816.79% (p=0.000 n=10)
Write64            6.599Mi ± 0%   57.268Mi ± 0%   +767.77% (p=0.000 n=10)   57.368Mi ± 0%   +769.29% (p=0.000 n=10)
Write1K            6.819Mi ± 0%   79.408Mi ± 0%  +1064.55% (p=0.000 n=10)   79.246Mi ± 0%  +1062.17% (p=0.000 n=10)
Write2M            5.140Mi ± 0%   47.169Mi ± 0%   +817.63% (p=0.000 n=10)   47.116Mi ± 0%   +816.60% (p=0.000 n=10)
Write64Unaligned   6.428Mi ± 3%   56.992Mi ± 1%   +786.65% (p=0.000 n=10)   56.424Mi ± 1%   +777.82% (p=0.000 n=10)
Write1KUnaligned   6.814Mi ± 2%   79.293Mi ± 0%  +1063.68% (p=0.000 n=10)   79.513Mi ± 0%  +1066.90% (p=0.000 n=10)
Write2MUnaligned   5.016Mi ± 2%   47.183Mi ± 1%   +840.59% (p=0.000 n=10)   47.183Mi ± 0%   +840.59% (p=0.000 n=10)
geomean            5.858Mi         49.17Mi        +739.29%                   49.02Mi        +736.70%

pkg: golang.org/x/crypto/chacha20poly1305
                              │   old.txt    │                 asm.txt                 │            asm-mips32r2.txt            │
                              │     B/s      │      B/s        vs base                 │      B/s       vs base                 │
Chacha20Poly1305/Open-64        1.230Mi ± 4%    3.042Mi ±  1%  +147.29% (p=0.000 n=10)    3.548Mi ± 2%  +188.37% (p=0.000 n=10)
Chacha20Poly1305/Seal-64        1.144Mi ± 1%    3.462Mi ±  1%  +202.50% (p=0.000 n=10)    3.810Mi ± 1%  +232.92% (p=0.000 n=10)
Chacha20Poly1305/Open-64-X      908.2Ki ± 1%   1718.8Ki ±  2%   +89.25% (p=0.000 n=10)   1840.8Ki ± 2%  +102.69% (p=0.000 n=10)
Chacha20Poly1305/Seal-64-X      839.8Ki ± 1%   1894.5Ki ±  2%  +125.58% (p=0.000 n=10)   2006.8Ki ± 2%  +138.95% (p=0.000 n=10)
Chacha20Poly1305/Open-1024      2.594Mi ± 3%    9.975Mi ±  1%  +284.56% (p=0.000 n=10)   13.208Mi ± 3%  +409.19% (p=0.000 n=10)
Chacha20Poly1305/Seal-1024      2.551Mi ± 1%   10.600Mi ±  2%  +315.51% (p=0.000 n=10)   14.353Mi ± 3%  +462.62% (p=0.000 n=10)
Chacha20Poly1305/Open-1024-X    2.470Mi ± 0%    8.569Mi ±  0%  +246.91% (p=0.000 n=10)   10.705Mi ± 2%  +333.40% (p=0.000 n=10)
Chacha20Poly1305/Seal-1024-X    2.413Mi ± 1%    9.036Mi ±  1%  +274.51% (p=0.000 n=10)   11.330Mi ± 1%  +369.57% (p=0.000 n=10)
Chacha20Poly1305/Open-1350      2.594Mi ± 3%    9.899Mi ±  2%  +281.62% (p=0.000 n=10)   13.237Mi ± 2%  +410.29% (p=0.000 n=10)
Chacha20Poly1305/Seal-1350      2.556Mi ± 1%   10.471Mi ±  1%  +309.70% (p=0.000 n=10)   13.452Mi ± 1%  +426.31% (p=0.000 n=10)
Chacha20Poly1305/Open-1350-X    2.503Mi ± 2%    8.817Mi ±  1%  +252.19% (p=0.000 n=10)   11.382Mi ± 1%  +354.67% (p=0.000 n=10)
Chacha20Poly1305/Seal-1350-X    2.460Mi ± 0%    9.093Mi ±  1%  +269.57% (p=0.000 n=10)   11.873Mi ± 2%  +382.56% (p=0.000 n=10)
Chacha20Poly1305/Open-2048      2.694Mi ± 2%   11.024Mi ±  2%  +309.20% (p=0.000 n=10)   14.963Mi ± 1%  +455.40% (p=0.000 n=10)
Chacha20Poly1305/Seal-2048      2.699Mi ± 0%   11.477Mi ±  2%  +325.27% (p=0.000 n=10)   15.240Mi ± 1%  +464.66% (p=0.000 n=10)
Chacha20Poly1305/Open-2048-X    2.637Mi ± 1%   10.056Mi ±  1%  +281.37% (p=0.000 n=10)   13.375Mi ± 1%  +407.23% (p=0.000 n=10)
Chacha20Poly1305/Seal-2048-X    2.627Mi ± 1%   10.328Mi ±  2%  +293.10% (p=0.000 n=10)   13.819Mi ± 2%  +425.95% (p=0.000 n=10)
Chacha20Poly1305/Open-4096      2.732Mi ± 5%   11.225Mi ±  4%  +310.82% (p=0.000 n=10)   16.041Mi ± 4%  +487.09% (p=0.000 n=10)
Chacha20Poly1305/Seal-4096      2.704Mi ± 2%   10.839Mi ±  7%  +300.88% (p=0.000 n=10)   15.693Mi ± 7%  +480.42% (p=0.000 n=10)
Chacha20Poly1305/Open-4096-X    2.670Mi ± 1%   10.381Mi ±  4%  +288.75% (p=0.000 n=10)   15.035Mi ± 4%  +463.04% (p=0.000 n=10)
Chacha20Poly1305/Seal-4096-X    2.680Mi ± 1%   10.867Mi ±  5%  +305.52% (p=0.000 n=10)   15.421Mi ± 7%  +475.44% (p=0.000 n=10)
Chacha20Poly1305/Open-8192      2.708Mi ± 2%   11.053Mi ±  3%  +308.10% (p=0.000 n=10)   15.926Mi ± 5%  +488.03% (p=0.000 n=10)
Chacha20Poly1305/Seal-8192      2.632Mi ± 4%   10.896Mi ±  6%  +313.95% (p=0.000 n=10)   16.031Mi ± 5%  +509.06% (p=0.000 n=10)
Chacha20Poly1305/Open-8192-X    2.666Mi ± 4%   10.948Mi ±  4%  +310.73% (p=0.000 n=10)   15.855Mi ± 3%  +494.81% (p=0.000 n=10)
Chacha20Poly1305/Seal-8192-X    2.637Mi ± 2%   10.805Mi ±  2%  +309.76% (p=0.000 n=10)   14.725Mi ± 6%  +458.41% (p=0.000 n=10)
Chacha20Poly1305/Open-16384     2.499Mi ± 4%   10.405Mi ± 13%  +316.41% (p=0.000 n=10)   13.628Mi ± 7%  +445.42% (p=0.000 n=10)
Chacha20Poly1305/Seal-16384     2.484Mi ± 4%    9.069Mi ±  4%  +265.07% (p=0.000 n=10)   12.131Mi ± 3%  +388.29% (p=0.000 n=10)
Chacha20Poly1305/Open-16384-X   2.389Mi ± 7%   10.028Mi ±  5%  +319.76% (p=0.000 n=10)   14.472Mi ± 3%  +505.79% (p=0.000 n=10)
Chacha20Poly1305/Seal-16384-X   2.475Mi ± 4%    9.084Mi ±  2%  +267.05% (p=0.000 n=10)   12.212Mi ± 6%  +393.45% (p=0.000 n=10)
geomean                         2.259Mi         8.271Mi        +266.21%                   10.90Mi       +382.79%

Fixes golang/go#39139
@gopherbot
Copy link
Contributor

This PR (HEAD: 2de561a) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/crypto/+/585755.

Important tips:

  • Don't comment on this PR. All discussion takes place in Gerrit.
  • You need a Gmail or other Google account to log in to Gerrit.
  • To change your code in response to feedback:
    • Push a new commit to the branch used by your GitHub PR.
    • A new "patch set" will then appear in Gerrit.
    • Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
    • Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
    • Multiple commits in the PR will be squashed by GerritBot.
  • The title and description of the GitHub PR are used to construct the final commit message.
    • Edit these as needed via the GitHub web interface (not via Gerrit or git).
    • You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
  • See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/585755.
After addressing review feedback, remember to publish your drafts!

@stffabi stffabi changed the title x/crypto/chacha20, x/crypto/poly1305: Add MIPSLE assembly version crypto/chacha20, crypto/poly1305: add MIPSLE assembly version May 15, 2024
@gopherbot
Copy link
Contributor

Message from --:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/585755.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from --:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/585755.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cmd/compile: intrinsify bits.RotateLeft32 on mipsle
2 participants