FR: handle and support a provided object version identifiers in data imports #8554
Labels
area/API
Improvements or additions to the API
area/block-adapter
area/cataloger
Improvements or additions to the cataloger
contributor
feature-request
P3
background
LakeFS supports referencing external data through imports without copying it, but currently only references objects by path, without considering additional parameters like a
versionId
note: while object locking is one way to enforce WORM principles on the external storage system, enabling bucket versioning provides a less intrusive mechanism to achieve immutability
proposal
if external data will be "just" referenced in LakeFS (e.g., for lineage, publishing, or sharing), it is critical to ensure that objects remain unchanged after import - versioned buckets provide a way to enforce this immutability and if LakeFS considers the
versionId
it guarantees this behavior.handle
during import
consider a provided versionId, e.g.,
lakectl import --from 's3://<bucket>/test.txt?versionId=<version-id>' --to lakefs://<repo>/<branch>/test.txt
retrieval
API (direct & pre-signed URL generation) and S3-gateway should return the specified (and not latest) object version
side note: not letting LakeFS manage the object lifecycle will miss out on many of the valuable features LakeFS offers right out of the box (gc,...)...but that is another story
The text was updated successfully, but these errors were encountered: