Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(unified) GC: Improve resiliance for unexpected files under _lakefs #8518

Open
yonipeleg33 opened this issue Jan 20, 2025 · 1 comment
Open

Comments

@yonipeleg33
Copy link
Contributor

Currently, if users upload files to the _lakefs prefix, the GC will fail to run with an ugly error:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 88 in stage 7.0 failed 4 times, most recent failure: Lost task 88.3 in stage 7.0 (TID 42180) ([2a05:d018:179b:7f01:d8e3:50aa:4d29:3899] executor 121): io.treeverse.jpebble.BadFileFormatException: Bad magic 37 66 30 31 22 0a 7d 0a: wrong bytes

For context, when we added the dummy file under _lakefs we had to explicitly ignore it in the GC.
Back then, we had the idea of whitelisting only metadata files, but were against it.

We need to revisit this decision, or come up with other ideas to improve the GC's resistance for such cases.

@arielshaqed arielshaqed changed the title (unified) GC: Improve resistancy for unexpected files under _lakefs (unified) GC: Improve resiliance for unexpected files under _lakefs Jan 22, 2025
@Jonathan-Rosenberg
Copy link
Contributor

Jonathan-Rosenberg commented Jan 29, 2025

Why this happens?

When a lakeFS repository isn't configured with GC rules, GC will consider all commits as active, list them, and create a dataframe using all objects from all commits so that it will remove them from the "objects to delete" list (because they are considered active).
When the commits are listed, every object under _lakefs is considered a meta/range except for dummy, therefore, GC will open all of these objects and will expect to find a graveler split. If a random non-graveler file is there, it will crash.

@Jonathan-Rosenberg Jonathan-Rosenberg removed their assignment Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants