-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-8532] Disallow certain schema evolution actions if a field appears in the partition spec #25114
[CORE-8532] Disallow certain schema evolution actions if a field appears in the partition spec #25114
Conversation
CI test resultstest results on build#62039
test results on build#62052
test results on build#62068
test results on build#62129
test results on build#62149
test results on build#62177
|
01658cd
to
202a811
Compare
202a811
to
5d22c2d
Compare
472426e
to
523305a
Compare
/ci-repeat 1 |
CI Failures:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Mostly a bunch of nits and questions
@@ -540,6 +547,7 @@ static const std::vector<struct_evolution_test_case> valid_cases{ | |||
&& dst_nested.fields.empty() | |||
&& removed(*src_nested.fields.back()); | |||
}, | |||
.pspec = "(nested)", // TODO(oren): seems wrong frankly, should this fail? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source columns, selected by ids, must be a primitive type and cannot be contained in a map or list, but may be nested in a struct.
Hmm yea this is interesting. Maybe not in this PR, but feels like we should treat it like we would a missing field when resolving
redpanda/src/v/iceberg/partition.cc
Lines 38 to 40 in 5fe9fc9
if (!source_field) { | |
return std::nullopt; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, i see. so a struct or map or whatever isn't even a valid source column? I don't think partition spec resolution is detecting that at all then. See L448 in this file e.g.... seems like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query_engine=QUERY_ENGINES, | ||
use_partition_spec=[True, False], | ||
) | ||
def test_partition_spec_evo(self, cloud_storage_type, query_engine, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but it's worth trying these out with a a real catalog instead of just filesystem_catalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
5961ed2
to
fbb780d
Compare
Retry command for Build#62129please wait until all jobs are finished before running the slash command
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice
Logical DATE type should be bound to AVRO_INT only. Previously we used AVRO_LONG, so the Avro lib would throw when exercising this code path. Also adds a unit test for Iceberg types that are backed by Avro logicalTypes. Note that timestamptz serialization is currently bugged, as it relies on an extra non-standard field ('adjust-to-utc') in the Avro type description. Technically this should appear in both micro precision timestamp types, 'false' for 'timestamp'; 'true' for 'timestamptz'. As written, both timestamp types serialize to logical type timestamp-micros, with no way to distinguish between the two in deser. See https://iceberg.apache.org/spec/#avro for detail Signed-off-by: Oren Leiman <[email protected]>
To support special promotions that affect partition transforms type_promoted: - no - yes - unless_partition Signed-off-by: Oren Leiman <[email protected]>
Signed-off-by: Oren Leiman <[email protected]>
Specify either an annotate error or a validate error. This simplifies some of the conditional logic around instantiating test suites. No functional changes. Signed-off-by: Oren Leiman <[email protected]>
Signed-off-by: Oren Leiman <[email protected]>
Signed-off-by: Oren Leiman <[email protected]>
…otion Signed-off-by: Oren Leiman <[email protected]>
Though this restriction does not appear in the spec, dropping a data column that also appears in the table's partition spec can cause validation errors in clients, downstream. It may be possible to avoid this by performing live partition spec reconciliation inline with the schema update, but for now we simply reject such an update out of hand. Also slightly refactors to consolidate error checks in validate_schema_transform. Signed-off-by: Oren Leiman <[email protected]>
Signed-off-by: Oren Leiman <[email protected]>
fbb780d
to
035c4e3
Compare
force push typos |
/ci-repeat 1 |
/ci-repeat 1 |
General idea is as follows (quoted from the spec):
In general, the type promotion rules preclude this, but date -> timestamp promotions are called out specifically as potentially rule violating, if the promoted field appears in the partition spec. So this diff prevents that by wiring the current partition spec into the compat module and adding a check to the validation path.
This PR also disallows removal of any data field which also appears in the current partition spec at the time of removal. This is technically outside the spec and needs followup ecosystem investigation, but it appears that dropping a column in this way can cause downstream catalog queries to fail. The safest thing at this stage is to simply disable the capability for now.
This PR also fixes a bug in
iceberg/schema_avro
whereby we would try to serializedate_type
to anAVRO_LONG
with logical type "date", which itself is restricted toAVRO_INT
. Fix includes a unit test.Backports Required
Release Notes