Add test cases for handling overflows on dequantization and fix size of count7x7 #78

mcroomp · 2024-05-18T17:00:38Z

Unfortunately Lepton doesn't check to make sure that coefficients * quantization table aren't greater than 16 bits. This can cause all kinds of weird behavior so we need to make sure that this behavior doesn't change.

Tests showed found a regression in #67 that count7x7 is too small by one and would cause an assertion decoding some images.

Explicitly import rand_chacha instead of using the standard library for generating random numbers. Since the standard library can change implementations, we want it to be fixed so that our hashes don't change.

Melirius · 2024-05-18T18:52:43Z

Hmm, in principle yes, the quantization table can be of (2^16-1), but Lepton does not support non-8-bit-depth JPEGs, and citing JPEG standard, sec. B.2.4.1 "Quantization table-specification syntax": "An 8-bit DCT-based process shall not use a 16-bit precision quantization table." So in principle standard-conforming JPEG able to be coded by Lepton should have 8-bit quantization table. Dequantization in DB Lepton, however, always done in 32 bits accuracy, and support for 16-bit tables was present there.

Melirius · 2024-05-18T19:29:23Z

@mcroomp, you need to update cargo files, it fails in CI.

Melirius · 2024-05-18T19:39:46Z

src/structs/model.rs

@@ -163,13 +163,13 @@ pub struct ModelPerColor {

 #[derive(DefaultBoxed)]
 struct Counts7x7 {
-    exponent_counts: [[Branch; MAX_EXPONENT]; MAX_EXPONENT],
+    exponent_counts: [[Branch; MAX_EXPONENT]; NUMERIC_LENGTH_MAX],


Why it is needed? This index is defined by the range of bit width of calc_coefficient_context_7x7_aavg_block results, and they are maximum 11 bits wide, not 12. Huffman table of 8-bit JPEG has only bit widths up to 11, wider coefficients cannot appear in JPEG file.

11 bits wide, but need 12 array slots (one for 0). Not a problem for the exponent array since the 12th slot is implied.

Oh, sorry, 11 bits are for DC coef only, these 7x7 AC coefs are at most 10 bits wide, see table F.2 of the standard. And prediction does not extend dynamic range. So still MAX_EXPONENT should be enough.

What was the test values that give OOB?

Heh, it is even more interesting. In JPEG DC difference is coded, and it can be of -2047 to +2047 (see table F.1). However, Lepton is working with actual (but shifted) DC, that is shorter: actual DC of the block should be in [0,2040] range, but shift of each pixel by -128 following F.1.1.3 makes its range [-1024,1016]. The range wider breaks DC prediction function adv_predict_or_unpredict_dc, so it cannot be meaningfully coded by C++ Lepton too. I've made an example here, just run tests and see failure in DC prediction.

If we are talking about nonconforming JPEGs, then in principle Huffman table description allows coefficients up to 15 bit wide. It breaks everything in Lepton with overflows almost everywhere. Should we support them also?

It will be a major effort as maximal dequantized coefficient will not fit into i32: 65535*65535 > 2^31.

UPD: no, it is 15, so 65535*32767, it fits into i32, but all IDCT operations are still overflowing then, as well as edge predictions and DC predictions.

Also "insane" JPEG can have, for example, strictly increasing DC coefs, providing every time difference of +2047 for the next block. It is even conforming to the standard, but will break in adv_predict_or_unpredict_dc (see the example above). I'm not sure that we should support all these crazy variants.

I agree it sucks to have to handle these corner cases, but if the C++ allows the file (it only disallows abs(coefficients) > 2047), then we still have to be able to decompress it successfully. As a compressor, we have to be careful about not breaking existing compressed files.

Doesn't mean that we can't overflow, just that the overflow has to be handled in the same way as not to break decompression.

You're right, some comments then will be nice here in the code.

mcroomp · 2024-05-19T09:55:14Z

Hmm, in principle yes, the quantization table can be of (2^16-1), but Lepton does not support non-8-bit-depth JPEGs, and citing JPEG standard, sec. B.2.4.1 "Quantization table-specification syntax": "An 8-bit DCT-based process shall not use a 16-bit precision quantization table." So in principle standard-conforming JPEG able to be coded by Lepton should have 8-bit quantization table. Dequantization in DB Lepton, however, always done in 32 bits accuracy, and support for 16-bit tables was present there.

Yeah... the problem is that the corpus that we've already compressed with the C++ version might include weird non-standard conforming JPEGs. I'm thinking of adding an option to reject new JPEGs that overflow 16 bits on dequantize, since they create a headache for future compat.

Melirius · 2024-05-20T08:34:37Z

src/structs/lepton_encoder.rs

+    let left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    let above = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    let here = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    let above_left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));


Suggested change

let left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

let above = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

let here = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

let above_left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

let mut left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

let mut above = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

let mut here = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

let mut above_left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));

left.set_coefficient(0, rng.gen_range(-1024..=1023));

above.set_coefficient(0, rng.gen_range(-1024..=1023));

here.set_coefficient(0, rng.gen_range(-1024..=1023));

above_left.set_coefficient(0, rng.gen_range(-1024..=1023));

As I've shown in the example here, adv_predict_or_unpredict_dc breaks outside of this range and encoder will give an error on such a file, so values outside of it cannot appear in Lepton.

it actually breaks in multiple cases, both when the coefficient is out of the range, but also if the predictor is out of the range. Kind of annoying. I just ended up choosing a random seed that generated things that were within range.

I think the original idea was that what was sent to the encode was the delta between the predictor and actual DC, but there's a bug in the way that it adds the adjustment so effectively it rejects both if the DC is out of range, but also if the delta is out of range.

…of count7x7 (microsoft#78) * work in progress * improved tests * add 16bit and 32bit tests * add toml * fix tests

mcroomp added 3 commits May 18, 2024 14:25

work in progress

7f7120f

improved tests

a6ce2b2

add 16bit and 32bit tests

3a5b2da

Melirius reviewed May 18, 2024

View reviewed changes

add toml

b94be75

fix tests

b112d50

Melirius reviewed May 20, 2024

View reviewed changes

Melirius approved these changes May 20, 2024

View reviewed changes

mcroomp assigned MarioszMSFT and gbrovman May 21, 2024

MarioszMSFT approved these changes May 22, 2024

View reviewed changes

mcroomp merged commit b700f48 into main May 22, 2024
3 checks passed

mcroomp deleted the addoverflowtests branch May 22, 2024 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test cases for handling overflows on dequantization and fix size of count7x7 #78

Add test cases for handling overflows on dequantization and fix size of count7x7 #78

mcroomp commented May 18, 2024 •

edited

Loading

Melirius commented May 18, 2024 •

edited

Loading

Melirius commented May 18, 2024 •

edited

Loading

Melirius May 18, 2024

mcroomp May 19, 2024

Melirius May 19, 2024

Melirius May 19, 2024

Melirius May 19, 2024 •

edited

Loading

Melirius May 19, 2024 •

edited

Loading

Melirius May 19, 2024 •

edited

Loading

Melirius May 19, 2024 •

edited

Loading

mcroomp May 19, 2024

Melirius May 19, 2024

mcroomp commented May 19, 2024 •

edited

Loading

Melirius May 20, 2024

mcroomp May 20, 2024

-    let left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
-    let above = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
-    let here = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
-    let above_left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    let mut left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    let mut above = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    let mut here = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    let mut above_left = AlignedBlock::new(arr.map(|_| rng.gen_range(-2047..=2047)));
+    left.set_coefficient(0, rng.gen_range(-1024..=1023));
+    above.set_coefficient(0, rng.gen_range(-1024..=1023));
+    here.set_coefficient(0, rng.gen_range(-1024..=1023));
+    above_left.set_coefficient(0, rng.gen_range(-1024..=1023));

Add test cases for handling overflows on dequantization and fix size of count7x7 #78

Add test cases for handling overflows on dequantization and fix size of count7x7 #78

Conversation

mcroomp commented May 18, 2024 • edited Loading

Melirius commented May 18, 2024 • edited Loading

Melirius commented May 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Melirius May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Melirius May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Melirius May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Melirius May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcroomp commented May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcroomp commented May 18, 2024 •

edited

Loading

Melirius commented May 18, 2024 •

edited

Loading

Melirius commented May 18, 2024 •

edited

Loading

Melirius May 19, 2024 •

edited

Loading

Melirius May 19, 2024 •

edited

Loading

Melirius May 19, 2024 •

edited

Loading

Melirius May 19, 2024 •

edited

Loading

mcroomp commented May 19, 2024 •

edited

Loading