-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] [Storage] Decompression #39740
base: main
Are you sure you want to change the base?
[Draft] [Storage] Decompression #39740
Conversation
|
||
downloaded = await blob.download_blob(decompress=False) | ||
result = await downloaded.readall() | ||
assert result == compressed_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario (no content validation, upload compressed data, but download specifying decompress=False
) will fail here:
> assert result == compressed_data
E AssertionError: assert b'hello from gzip' == b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\xcaH\xcd\xc9\xc9WH+\xca\xcfUH\xaf\xca,\x00\x00\x00\x00\xff\xff\x03\x00d\xaa\x8e\xb5\x0f\x00\x00\x00'
E
E At index 0 diff: b'h' != b'\x1f'
E
E Full diff:
E + (b'hello from gzip')
E - (b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\xcaH\xcd\xc9\xc9WH+\xca\xcf'
E - b'UH\xaf\xca,\x00\x00\x00\x00\xff\xff\x03\x00d\xaa\x8e\xb5\x0f\x00\x00\x00')
test_common_blob_async.py:3537: AssertionError
This is the download code for non-content validation path:
await data.response.load_body() # data.response is type RestAioHttpTransportResponse
content = cast(bytes, data.response.body())
The issue is that RestAioHttpTransportResponse
did not respect decompress=False
, so it decompressed the compressed data.
API change check API changes are not detected in this pull request. |
|
||
# Act / Assert | ||
await blob.upload_blob(data=compressed_data, content_settings=content_settings, overwrite=True) | ||
downloaded = await blob.download_blob(validate_content=True, decompress=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario (validate_content=True, decompress=False
), will not even return and will fail during the content validation step:
# response.http_response type is RestAioHttpTransportResponse
await response.http_response.load_body() # Load the body in memory and close the socket
except (StreamClosedError, StreamConsumedError):
pass
computed_md5 = response.http_request.headers.get('content-md5', None) or \
encode_base64(StorageContentValidation.get_content_md5(response.http_response.body()))
This is because computed_md5
will evaluate to encode_base64(StorageContentValidation.get_content_md5(response.http_response.body()))
,
where response.http_response.body()
, because decompression is on by default (and our decompress=False
had no effect), will compute the md5 against the decompressed data. Whereas the md5 returned at response.http_response.headers['content-md5']
is computed against the data uploaded (which was compressed).
Therefore, we need to be able to request the data back from Storage without auto-decompression, so that we can properly perform the MD5 check.
No description provided.