-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alter_table: Support parsing VERSION
and ADDED
keywords
#29647
alter_table: Support parsing VERSION
and ADDED
keywords
#29647
Conversation
* support new 'VERSION' and 'ADDED' keywords in the parser * add a 'version' field to ColumnOptions and ItemName
* track if planning is happening in the context of a user query * add statement validation to check if a user manually typed 'VERSION' * include a dyncfg to CYA in case something goes wrong
665342f
to
aaad6c6
Compare
aaad6c6
to
c6df4b6
Compare
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:This pull request carries a high risk score of 76, primarily driven by predictors such as File Diffusion and Delta of Executable Lines, and includes modifications to one file hotspot. Historically, pull requests with these predictors are 78% more likely to cause a bug compared to the repository baseline. Notably, the observed bug trend in the repository is decreasing. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
VERSION
and ADDED
keywordsVERSION
and ADDED
keywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/sql-parser/src/parser.rs
Outdated
self.prev_token(); | ||
self.expected( | ||
self.peek_pos(), | ||
"non-negative version number", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think technically the parsing can also fail if the number is too large? Maybe something like "unsigned 64-bit version number" is more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! It turns out we have a parse_literal_uint(...)
method, I updated to use that which handles all of these cases
src/sql-parser/src/parser.rs
Outdated
@@ -6101,6 +6107,24 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
fn parse_version(&mut self) -> Result<Version, ParserError> { | |||
let Value::Number(val) = self.parse_number_value()? else { | |||
unreachable!("programming error, expected Value::Number") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes this unreachable? What if someone types version information into their DDL by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_number_value(...)
returns a Value::Number
, so anything else should be impossible. I realized we have a better parse_literal_uint(...)
method though so I switched to that
src/sql-parser/tests/testdata/create
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add one or two negative tests like ... VERSION nonsense ...
or ... VERSION ADDED -10...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
|
||
parse-statement | ||
CREATE VIEW v1 AS SELECT * FROM [ u1 as materialize.public.t1 VERSION 5] | ||
---- | ||
CREATE VIEW v1 AS SELECT * FROM [u1 AS materialize.public.t1 VERSION 5] | ||
=> | ||
CreateView(CreateViewStatement { if_exists: Error, temporary: false, definition: ViewDefinition { name: UnresolvedItemName([Ident("v1")]), columns: [], query: Query { ctes: Simple([]), body: Select(Select { distinct: None, projection: [Wildcard], from: [TableWithJoins { relation: Table { name: Id("u1", UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("t1")]), Some(Version(5))), alias: None }, joins: [] }], selection: None, group_by: [], having: None, options: [] }), order_by: [], limit: None, offset: None } } }) | ||
|
||
parse-statement | ||
CREATE VIEW "materialize"."public"."v3" AS SELECT * FROM [u1 AS "materialize"."public"."t1" VERSION 3] | ||
---- | ||
CREATE VIEW materialize.public.v3 AS SELECT * FROM [u1 AS materialize.public.t1 VERSION 3] | ||
=> | ||
CreateView(CreateViewStatement { if_exists: Error, temporary: false, definition: ViewDefinition { name: UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("v3")]), columns: [], query: Query { ctes: Simple([]), body: Select(Select { distinct: None, projection: [Wildcard], from: [TableWithJoins { relation: Table { name: Id("u1", UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("t1")]), Some(Version(3))), alias: None }, joins: [] }], selection: None, group_by: [], having: None, options: [] }), order_by: [], limit: None, offset: None } } }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on the negative tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
src/sql/src/plan/statement.rs
Outdated
/// Validates this statement can be planned in the provided context. | ||
fn validate_statement<'a>( | ||
stmt: &'a Statement<Aug>, | ||
ctx: &'a StatementContext<'a>, | ||
) -> Result<(), PlanError> { | ||
let mut validator = StatementValidator::new(ctx); | ||
// Recursively visits the different parts of the Statement. | ||
validator.visit_statement(stmt); | ||
|
||
// Statement validation has the possibility of causing sticky panics if | ||
// something fails when openning the catalog, so we give ourselves an out. | ||
if SKIP_STATEMENT_VALIDATION.get(ctx.catalog.system_vars().dyncfgs()) { | ||
if validator.state.is_err() { | ||
tracing::warn!( | ||
error = ?validator.state, | ||
"skipping statement validation when there was an error" | ||
); | ||
} | ||
validator.state = Ok(()); | ||
} | ||
|
||
// Return the state the validator was left in. | ||
validator.state | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the way that we've handled this in the past is to due the validation in sequencing (for example in sequence_create_table
). Generally an item only goes through sequencing when it's first created so we know that all user typed DDL will go through sequencing, but other DDL will not.
Here's a recent example:
materialize/src/adapter/src/coord/sequencer/inner.rs
Lines 660 to 678 in 8d1128a
match &plan.connection.details { | |
ConnectionDetails::Ssh { key_1, key_2, .. } => { | |
let key_1 = match key_1.as_key_pair() { | |
Some(key_1) => key_1.clone(), | |
None => { | |
return ctx.retire(Err(AdapterError::Unstructured(anyhow!( | |
"the PUBLIC KEY 1 option cannot be explicitly specified" | |
)))) | |
} | |
}; | |
let key_2 = match key_2.as_key_pair() { | |
Some(key_2) => key_2.clone(), | |
None => { | |
return ctx.retire(Err(AdapterError::Unstructured(anyhow!( | |
"the PUBLIC KEY 2 option cannot be explicitly specified" | |
)))) | |
} | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not necessarily saying you should remove this validation, but I'm just wondering if there's a good reason to prefer doing it in planning instead of sequencing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I thought a specific validation step in planning would be more clear, but I think putting it in sequencing is easier and fits with the status quo so I removed the new validation logic and will add the check in sequencing in a follow up PR.
Note: It's not in this PR yet because the VERSION
constraint gets rejected in planning as unsupported, I updated the SLT test to reflect that
This PR adds support for parsing the
VERSION
andADDED
keywords ultimately in support of adding columns to tables.These new keywords will appear in table definitions and item references, i.e.
CREATE TABLE t1 (a int, b text VERSION ADDED 1)
CREATE VIEW v1 AS SELECT * FROM [u1 AS "materialize"."public"."t1" VERSION 5]
This way we know at what version of a table a column was added and what version of table is referenced in a downstream object, e.g. a
VIEW
.We prevent users from manually versioning columns and tables by introducing a statement validation step before planning. If thePlanContext
indicates we're planning in the context of a user query, we bail if we find a versioned object. Also included is adyncfg
to disable validation as a CYA. Further there is a small SLT to exercise these scenarios.Motivation
Progress towards https://github.com/MaterializeInc/database-issues/issues/8233
Tips for reviewer
This PR is split into separate commits which might make it easier to review:
PlanContext
a bit moreChecklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.