Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSTD-22 is 30 times slower than it should be #4126

Open
aleksazr opened this issue Aug 13, 2024 · 9 comments
Open

ZSTD-22 is 30 times slower than it should be #4126

aleksazr opened this issue Aug 13, 2024 · 9 comments
Assignees

Comments

@aleksazr
Copy link

aleksazr commented Aug 13, 2024

I have some binary files which are essentialy identical,
but zstd on level 22 is waaay slower on some of those files:

fast0.bin : 325949144 -> 2216950 (x147.03), 16.6 MB/s, 3339.1 MB/s
fast1.bin : 277326896 -> 2180040 (x127.21), 14.9 MB/s, 3818.3 MB/s
fast2.bin : 307040492 -> 2198873 (x139.64), 16.2 MB/s, 3960.9 MB/s
slow0.bin : 216098880 -> 1765827 (x122.38), 0.54 MB/s, 4015.0 MB/s
slow1.bin : 191787756 -> 1749474 (x109.63), 0.58 MB/s, 3930.7 MB/s
slow2.bin : 162074160 -> 1709061 (x94.83), 0.65 MB/s, 3694.5 MB/s

Also, x64 version of zstd.exe might produce
Assertion failed: cSize < UINT_MAX, file benchzstd.c, line 628

zstd slow.zip

@Cyan4973 Cyan4973 self-assigned this Jan 30, 2025
@Cyan4973
Copy link
Contributor

Cyan4973 commented Jan 31, 2025

I can confirm the observed behavior on the provided files.

Analysis of File Content:

  • The files contain a significant number of \0 bytes—this accounts for the majority of their payload.
  • They exhibit extremely high compressibility, reaching compression ratios exceeding 100×, which is well beyond typical levels. This extreme compressibility may lead to unusual side effects.
  • Compression and decompression speeds are significantly higher than normal, likely due to the high redundancy in the data.
  • slow files achieve lower compression ratios compared to fast files.
  • Compression at level 22 is particularly slow on slow files. However, this is relative—other levels are unusually fast, making level 22 seem disproportionately slow. In absolute terms, level 22’s performance on slow files aligns more closely with expected compression speeds for this level.

Benchmark on a Local Laptop

Compression Performance for fast and slow Files:

22#fast0.bin         : 325949144 ->   2216743 (x147.04),   31.6 MB/s, 9173.4 MB/s
22#fast1.bin         : 277326896 ->   2180960 (x127.16),   28.4 MB/s, 8644.4 MB/s
22#fast2.bin         : 307040492 ->   2200669 (x139.52),   26.6 MB/s, 8523.6 MB/s
22#slow0.bin         : 216098880 ->   1765633 (x122.39),   1.30 MB/s, 9551.5 MB/s
22#slow1.bin         : 191787756 ->   1749145 (x109.65),   1.45 MB/s, 8649.6 MB/s
22#slow2.bin         : 162074160 ->   1709983 (x94.78),    1.53 MB/s, 8152.5 MB/s

The slow files indeed compress much more slowly.

Comparison with Standard silesia Corpus:

22#dickens           :  10192446 ->   2849064 (x3.577),   2.12 MB/s, 1335.7 MB/s
22#mozilla           :  51220480 ->  14914116 (x3.434),   3.20 MB/s,  975.0 MB/s
22#mr                :   9970564 ->   3102398 (x3.214),   3.02 MB/s, 1140.5 MB/s
22#nci               :  33553445 ->   1579583 (x21.24),   1.99 MB/s, 3848.3 MB/s
22#osdb              :  10085684 ->   3100120 (x3.253),   3.47 MB/s, 1584.3 MB/s
22#reymont           :   6627202 ->   1348518 (x4.914),   2.77 MB/s, 1592.2 MB/s
22#samba             :  21606400 ->   3866382 (x5.588),   2.56 MB/s, 1928.9 MB/s
22#webster           :  41458703 ->   8431641 (x4.917),   1.97 MB/s, 1496.7 MB/s
22#x-ray             :   8474240 ->   5129824 (x1.652),   3.78 MB/s,  682.4 MB/s
22#xml               :   5345280 ->    451660 (x11.83),   2.94 MB/s, 3110.8 MB/s

Here, slow files exhibit compression speeds that are in line with normal expectations for level 22. This suggests that the fast files and other levels are the real outliers, being unusually fast.


Explanation: Why Such a Big Difference?

  1. Search Complexity and Redundancy

    • The compression search is driven by the number of candidate matches to evaluate.
    • Since these files contain extreme redundancy, there are an enormous number of candidates.
    • Evaluating all candidates is impractical, so the algorithm imposes limits to maintain reasonable speeds.
    • Higher compression levels (e.g., level 22) allow for more extensive searching, which in this case, results in slower performance.
  2. Skipping Mechanism for Highly Compressible Files

    • When files are highly compressible, the algorithm can skip entire segments of input data, assuming that any marginal improvements wouldn’t justify the computational cost.
    • This skipping mechanism activates more aggressively at lower levels but is more conservative at level 22.
  3. Impact of Skipping on Performance

    • The key difference between fast and slow files likely occurs right before or after the skipping threshold for level 22.
    • Due to the highly regular nature of the data, match sizes are likely very consistent—either frequently above or below this threshold.
    • Hypothesis: On fast files, the matches are large enough to trigger skipping even at level 22, leading to higher compression speeds.

Solution: Reducing the Search Limit

If the slowdown at level 22 is undesirable, the simplest fix is to reduce to level 21, which maintains good speed across all samples.

Performance at Level 21:

21#fast0.bin         : 325949144 ->   2349686 (x138.72),   50.2 MB/s, 9570.1 MB/s
21#fast1.bin         : 277326896 ->   2276676 (x121.81),   36.6 MB/s, 8956.8 MB/s
21#fast2.bin         : 307040492 ->   2322176 (x132.22),   45.8 MB/s, 9520.0 MB/s
21#slow0.bin         : 216098880 ->   2149040 (x100.56),   27.5 MB/s, 8211.9 MB/s
21#slow1.bin         : 191787756 ->   2117608 (x90.57),    23.8 MB/s, 7668.1 MB/s
21#slow2.bin         : 162074160 ->   2087994 (x77.62),    23.2 MB/s, 7131.8 MB/s
  • Speed remains much higher, even on slow files.
  • However, compression ratio is reduced—level 22 does yield better compression due to its more extensive search.

Intermediate Solution: -21 --long

A compromise between speed and compression ratio:

21#fast0.bin         : 325949144 ->   2254455 (x144.58),   48.5 MB/s, 9936.1 MB/s
21#fast1.bin         : 277326896 ->   2214865 (x125.21),   35.8 MB/s, 8839.5 MB/s
21#fast2.bin         : 307040492 ->   2235240 (x137.36),   44.4 MB/s, 9659.9 MB/s
21#slow0.bin         : 216098880 ->   2129056 (x101.50),   28.7 MB/s, 8171.9 MB/s
21#slow1.bin         : 191787756 ->   2104324 (x91.14),    24.6 MB/s, 7397.1 MB/s
21#slow2.bin         : 162074160 ->   2076254 (x78.06),    20.3 MB/s, 7125.2 MB/s
  • Compression ratio improves slightly but does not match level 22.
  • Speed remains high, keeping the performance overhead manageable.

Conclusion

  • Level 22 is not inherently slow—it just behaves as expected while other levels benefit from extreme redundancy.
  • The fast files compress unusually quickly because their redundancy triggers an aggressive skipping mechanism.
  • Switching to level 21 provides a simple solution to the speed discrepancy, though at the cost of some compression efficiency.
  • Using -21 --long can offer a reasonable balance between speed and compression ratio.

@aleksazr
Copy link
Author

aleksazr commented Feb 2, 2025

do you get "Assertion failed: cSize < UINT_MAX, file benchzstd.c, line 628"?

@Cyan4973
Copy link
Contributor

Cyan4973 commented Feb 2, 2025

do you get "Assertion failed: cSize < UINT_MAX, file benchzstd.c, line 628"?

nope, never.

If this assert() fires, it means that compression failed, which should never happen, unless the underlying hardware is flaky.

@aleksazr
Copy link
Author

aleksazr commented Feb 2, 2025

I got it on the first try, slow0.bin, today. I'm now using some other computer than back in August.
So, try it again, X64 version only, slow files.

In August it was Intel, now is AMD, both times Windows 11.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Feb 2, 2025

Which version are you using ?

@aleksazr
Copy link
Author

aleksazr commented Feb 2, 2025

The one from zstd.slow.zip (zstd64.exe) - which I got here
https://github.com/facebook/zstd/releases/download/v1.5.6/zstd-v1.5.6-win64.zip

from the ZSTDbench.bat file:
zstd64.exe -b22 "slow0.bin"

@aleksazr
Copy link
Author

aleksazr commented Feb 4, 2025

Ok, one more suggestion. Execute this .bat file:

@echo off
start cmd /k zstd64.exe -b22 "slow0.bin"
start cmd /k zstd64.exe -b22 "slow0.bin"
start cmd /k zstd64.exe -b22 "slow0.bin"
start cmd /k zstd64.exe -b22 "slow0.bin"
start cmd /k zstd64.exe -b22 "slow0.bin"
start cmd /k zstd64.exe -b22 "slow0.bin"

this will open several command prompts at once, and on one of my comps this sometimes results in all of them showing the error.
The error is displayed near the end of the benchmarking process, it seems, so it takes time.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Feb 4, 2025

-b22 uses a lot of memory.
And 6x -b22 uses 6x more.

Are you monitoring your system to be sure it can sustain such a high demand ?

@aleksazr
Copy link
Author

aleksazr commented Feb 4, 2025

Yes. It uses 1GB per proccess, so 6 GB total, and I have 20GB RAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants