Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: trying to understand how upload_block and upload_block_list work #252

Open
yxiang92128 opened this issue Apr 10, 2019 · 4 comments
Assignees
Labels

Comments

@yxiang92128
Copy link

Hi,
I wonder if anyone can shed light on how block id based transaction work.
From the limited sample, I think upload_block adds a request with block id in a upload operation based transaction and then by calling upload_block_list it starts to commit the list of block id which were build up in the request queue. What I don't understand is that

  1. With the cancellation token returned from upload_list_sync function, how do I use the token to cancel and undo the entire commit?
  2. After upload_block_list is done and returned, it doesn't seem to remember the list of block ids ever existed, therefore afterwards when I call "download_block_list", it simply returns a block_list of size 0. So is block id only transient during the transaction and thus a stateless concept afterwards?
  3. We wonder if we can use a block list to apply an "append" operation to a block_blob by reading the block_list and then increment the id and add a new id with the new stream buffer to append to the existing block_blob. And possibly use that same idea to add an "insert" operation. If this is the wrong track to take, what would you suggest us to do in this case to initiate the "insert" and "append" to an existing block_blob please?

Thanks,

Yang

@katmsft
Copy link
Member

katmsft commented Apr 11, 2019

Thanks for reaching out. Below are the answers:

  1. There isn't a cancel mechanism implemented at server side when commiting block list. The cancellation only resides in local for async operations. Once the operation is finished and returned, there is no way to revoke the commit operation.

  2. Block ID is not a transient value. It needs to be clarified that when downloading block list, there is a block_listing_filter that controlls which kind of blocks are listed. Once the list is commited, the blocks' status will be changed from 'uncommitted' to 'committed'. If you are still listing 'uncommitted' blocks, there will be no result. To avoid this, you can list with block_listing_filter::all instead.

  3. Append or insert is the key scenario supported by block blob. The design you mentioned is exactly what we suggest our user to do if they want to modify an existing block blob. Please also bear in mind that the maximum number of blocks are 50000 and the maximum size per block is 100MB.

In general, I would suggest that if you cannot find an appropriate sample for your user scenario, you can first check on the test code of this repository. It is not very well documented by it covers a lot of advanced user scenarios.

@katmsft katmsft self-assigned this Apr 11, 2019
@yxiang92128
Copy link
Author

Thanks for the reply.
One more question:
if a cancel operation is not supported and I couldn't find a corresponding example of undoing the "upload_block" function, how do I cancel the list of uncommited blocks and start from scratch of a new list of blocks within a single context?

Thanks,

Yang

@katmsft
Copy link
Member

katmsft commented Apr 22, 2019

Quote the remarks of the put block REST API documentation:

Put Block uploads a block for future inclusion in a block blob. A block blob can include a maximum of 50,000 blocks. Each block can be a different size, up to a maximum of 100 MB for version 2016-05-31 and later, and 4 MB for older versions. The maximum size of a block blob is therefore slightly more than 4.75 TB (100 MB X 50,000 blocks) for version 2016-05-31 and later, and 195 GB (4 MB X 50,000 blocks) for all older versions.

A blob can have a maximum of 100,000 uncommitted blocks at any given time. Starting in version 2016-05-31, the set of uncommitted blocks cannot exceed 9.52 TB in total size. For older versions, the set of uncommitted blocks cannot exceed 400 GB in total size. If these maximums are exceeded, the service returns status code 409 (RequestEntityTooLargeBlockCountExceedsLimit).

After you have uploaded a set of blocks, you can create or update the blob on the server from this set by calling the Put Block List operation. Each block in the set is identified by a block ID that is unique within that blob. Block IDs are scoped to a particular blob, so different blobs can have blocks with same IDs.

If you call Put Block on a blob that does not yet exist, a new block blob is created with a content length of 0. This blob is enumerated by the List Blobs operation if the include=uncommittedblobs option is specified. The block or blocks that you uploaded are not committed until you call Put Block List on the new blob. A blob created this way is maintained on the server for a week; if you have not added more blocks or committed blocks to the blob within that time period, then the blob is garbage collected.

A block that has been successfully uploaded with the Put Block operation does not become part of a blob until it is committed with Put Block List. Before Put Block List is called to commit the new or updated blob, any calls to Get Blob return the blob contents without the inclusion of the uncommitted block.

If you upload a block that has the same block ID as another block that has not yet been committed, the last uploaded block with that ID will be committed on the next successful Put Block List operation.

After Put Block List is called, all uncommitted blocks specified in the block list are committed as part of the new blob. Any uncommitted blocks that were not specified in the block list for the blob will be garbage collected and removed from the Blob service. Any uncommitted blocks will also be garbage collected if there are no successful calls to Put Block or Put Block List on the same blob within a week following the last successful Put Block operation. If Put Blob is called on the blob, any uncommitted blocks will be garbage collected.

If the blob has an active lease, the client must specify a valid lease ID on the request in order to write a block to the blob. If the client does not specify a lease ID, or specifies an invalid lease ID, the Blob service returns status code 412 (Precondition Failed). If the client specifies a lease ID but the blob does not have an active lease, the Blob service also returns status code 412 (Precondition Failed).

For a given blob, all block IDs must be the same length. If a block is uploaded with a block ID of a different length than the block IDs for any existing uncommitted blocks, the service returns error response code 400 (Bad Request).

If you attempt to upload a block that is larger than 100 MB for version 2016-05-31 and later, and larger than 4MB for older versions, the service returns status code 413 (Request Entity Too Large). The service also returns additional information about the error in the response, including the maximum block size permitted in bytes.

Calling Put Block does not update the last modified time of an existing blob.

Calling Put Block on a page blob returns an error.

Calling Put Block on an archived blob will return an error and on Hot/Cool blob does not change the blob tier.

Hence if you want to start from scratch, you can either overwrite the existing block IDs or use a new set of rule to generate consitent block IDs and commit them.

@Jinming-Hu
Copy link
Member

We're going to close this issue because of inactivity, feel free to reopen it if you have any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants