Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: handle and support a provided object version identifiers in data imports #8554

Open
achtsnits opened this issue Jan 27, 2025 · 0 comments
Labels
area/API Improvements or additions to the API area/block-adapter area/cataloger Improvements or additions to the cataloger contributor feature-request P3

Comments

@achtsnits
Copy link

background
LakeFS supports referencing external data through imports without copying it, but currently only references objects by path, without considering additional parameters like a versionId

note: while object locking is one way to enforce WORM principles on the external storage system, enabling bucket versioning provides a less intrusive mechanism to achieve immutability

proposal
if external data will be "just" referenced in LakeFS (e.g., for lineage, publishing, or sharing), it is critical to ensure that objects remain unchanged after import - versioned buckets provide a way to enforce this immutability and if LakeFS considers the versionId it guarantees this behavior.

handle

  • during import
    consider a provided versionId, e.g., lakectl import --from 's3://<bucket>/test.txt?versionId=<version-id>' --to lakefs://<repo>/<branch>/test.txt

  • retrieval
    API (direct & pre-signed URL generation) and S3-gateway should return the specified (and not latest) object version

side note: not letting LakeFS manage the object lifecycle will miss out on many of the valuable features LakeFS offers right out of the box (gc,...)...but that is another story

@arielshaqed arielshaqed added area/cataloger Improvements or additions to the cataloger area/API Improvements or additions to the API area/block-adapter labels Jan 27, 2025
@ozkatz ozkatz added the P3 label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API Improvements or additions to the API area/block-adapter area/cataloger Improvements or additions to the cataloger contributor feature-request P3
Projects
None yet
Development

No branches or pull requests

3 participants