You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, if users upload files to the _lakefs prefix, the GC will fail to run with an ugly error:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 88 in stage 7.0 failed 4 times, most recent failure: Lost task 88.3 in stage 7.0 (TID 42180) ([2a05:d018:179b:7f01:d8e3:50aa:4d29:3899] executor 121): io.treeverse.jpebble.BadFileFormatException: Bad magic 37 66 30 31 22 0a 7d 0a: wrong bytes
For context, when we added the dummy file under _lakefs we had to explicitly ignore it in the GC.
Back then, we had the idea of whitelisting only metadata files, but were against it.
We need to revisit this decision, or come up with other ideas to improve the GC's resistance for such cases.
The text was updated successfully, but these errors were encountered:
arielshaqed
changed the title
(unified) GC: Improve resistancy for unexpected files under _lakefs
(unified) GC: Improve resiliance for unexpected files under _lakefsJan 22, 2025
When a lakeFS repository isn't configured with GC rules, GC will consider all commits as active, list them, and create a dataframe using all objects from all commits so that it will remove them from the "objects to delete" list (because they are considered active).
When the commits are listed, every object under _lakefs is considered a meta/range except for dummy, therefore, GC will open all of these objects and will expect to find a graveler split. If a random non-graveler file is there, it will crash.
Currently, if users upload files to the
_lakefs
prefix, the GC will fail to run with an ugly error:For context, when we added the
dummy
file under_lakefs
we had to explicitly ignore it in the GC.Back then, we had the idea of whitelisting only metadata files, but were against it.
We need to revisit this decision, or come up with other ideas to improve the GC's resistance for such cases.
The text was updated successfully, but these errors were encountered: