Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alter_table: Support parsing VERSION and ADDED keywords #29647

Merged

Conversation

ParkMyCar
Copy link
Member

@ParkMyCar ParkMyCar commented Sep 18, 2024

This PR adds support for parsing the VERSION and ADDED keywords ultimately in support of adding columns to tables.

These new keywords will appear in table definitions and item references, i.e.

  1. CREATE TABLE t1 (a int, b text VERSION ADDED 1)
  2. CREATE VIEW v1 AS SELECT * FROM [u1 AS "materialize"."public"."t1" VERSION 5]

This way we know at what version of a table a column was added and what version of table is referenced in a downstream object, e.g. a VIEW.

We prevent users from manually versioning columns and tables by introducing a statement validation step before planning. If the PlanContext indicates we're planning in the context of a user query, we bail if we find a versioned object. Also included is a dyncfg to disable validation as a CYA. Further there is a small SLT to exercise these scenarios.

Motivation

Progress towards https://github.com/MaterializeInc/database-issues/issues/8233

Tips for reviewer

This PR is split into separate commits which might make it easier to review:

  1. Parser changes to support the new keyword and include it in the relevant types
  2. Adding statement validation and pluming around the PlanContext a bit more
  3. New sqllogictest to exercise this behavior

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

* support new 'VERSION' and 'ADDED' keywords in the parser
* add a 'version' field to ColumnOptions and ItemName
* track if planning is happening in the context of a user query
* add statement validation to check if a user manually typed 'VERSION'
* include a dyncfg to CYA in case something goes wrong
@ParkMyCar ParkMyCar marked this pull request as ready for review September 19, 2024 20:09
@ParkMyCar ParkMyCar requested review from a team as code owners September 19, 2024 20:09
Copy link

shepherdlybot bot commented Sep 19, 2024

Risk Score:76 / 100 Bug Hotspots:1 Resilience Coverage:50%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Observability
  • Feature Flag
  • Integration Test 🔍 Detected
  • QA Review
  • Run Nightly Tests
  • Unit Test
Risk Summary:

This pull request carries a high risk score of 76, primarily driven by predictors such as File Diffusion and Delta of Executable Lines, and includes modifications to one file hotspot. Historically, pull requests with these predictors are 78% more likely to cause a bug compared to the repository baseline. Notably, the observed bug trend in the repository is decreasing.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../sequencer/inner.rs 96

@ParkMyCar ParkMyCar changed the title [WIP] alter_table: Support parsing VERSION and ADDED keywords alter_table: Support parsing VERSION and ADDED keywords Sep 19, 2024
Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

self.prev_token();
self.expected(
self.peek_pos(),
"non-negative version number",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think technically the parsing can also fail if the number is too large? Maybe something like "unsigned 64-bit version number" is more accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! It turns out we have a parse_literal_uint(...) method, I updated to use that which handles all of these cases

@@ -6101,6 +6107,24 @@ impl<'a> Parser<'a> {
}
}

fn parse_version(&mut self) -> Result<Version, ParserError> {
let Value::Number(val) = self.parse_number_value()? else {
unreachable!("programming error, expected Value::Number")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes this unreachable? What if someone types version information into their DDL by mistake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_number_value(...) returns a Value::Number, so anything else should be impossible. I realized we have a better parse_literal_uint(...) method though so I switched to that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add one or two negative tests like ... VERSION nonsense ... or ... VERSION ADDED -10...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

Comment on lines +50 to +63

parse-statement
CREATE VIEW v1 AS SELECT * FROM [ u1 as materialize.public.t1 VERSION 5]
----
CREATE VIEW v1 AS SELECT * FROM [u1 AS materialize.public.t1 VERSION 5]
=>
CreateView(CreateViewStatement { if_exists: Error, temporary: false, definition: ViewDefinition { name: UnresolvedItemName([Ident("v1")]), columns: [], query: Query { ctes: Simple([]), body: Select(Select { distinct: None, projection: [Wildcard], from: [TableWithJoins { relation: Table { name: Id("u1", UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("t1")]), Some(Version(5))), alias: None }, joins: [] }], selection: None, group_by: [], having: None, options: [] }), order_by: [], limit: None, offset: None } } })

parse-statement
CREATE VIEW "materialize"."public"."v3" AS SELECT * FROM [u1 AS "materialize"."public"."t1" VERSION 3]
----
CREATE VIEW materialize.public.v3 AS SELECT * FROM [u1 AS materialize.public.t1 VERSION 3]
=>
CreateView(CreateViewStatement { if_exists: Error, temporary: false, definition: ViewDefinition { name: UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("v3")]), columns: [], query: Query { ctes: Simple([]), body: Select(Select { distinct: None, projection: [Wildcard], from: [TableWithJoins { relation: Table { name: Id("u1", UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("t1")]), Some(Version(3))), alias: None }, joins: [] }], selection: None, group_by: [], having: None, options: [] }), order_by: [], limit: None, offset: None } } })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on the negative tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

Comment on lines 1126 to 1149
/// Validates this statement can be planned in the provided context.
fn validate_statement<'a>(
stmt: &'a Statement<Aug>,
ctx: &'a StatementContext<'a>,
) -> Result<(), PlanError> {
let mut validator = StatementValidator::new(ctx);
// Recursively visits the different parts of the Statement.
validator.visit_statement(stmt);

// Statement validation has the possibility of causing sticky panics if
// something fails when openning the catalog, so we give ourselves an out.
if SKIP_STATEMENT_VALIDATION.get(ctx.catalog.system_vars().dyncfgs()) {
if validator.state.is_err() {
tracing::warn!(
error = ?validator.state,
"skipping statement validation when there was an error"
);
}
validator.state = Ok(());
}

// Return the state the validator was left in.
validator.state
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the way that we've handled this in the past is to due the validation in sequencing (for example in sequence_create_table). Generally an item only goes through sequencing when it's first created so we know that all user typed DDL will go through sequencing, but other DDL will not.

Here's a recent example:

match &plan.connection.details {
ConnectionDetails::Ssh { key_1, key_2, .. } => {
let key_1 = match key_1.as_key_pair() {
Some(key_1) => key_1.clone(),
None => {
return ctx.retire(Err(AdapterError::Unstructured(anyhow!(
"the PUBLIC KEY 1 option cannot be explicitly specified"
))))
}
};
let key_2 = match key_2.as_key_pair() {
Some(key_2) => key_2.clone(),
None => {
return ctx.retire(Err(AdapterError::Unstructured(anyhow!(
"the PUBLIC KEY 2 option cannot be explicitly specified"
))))
}
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily saying you should remove this validation, but I'm just wondering if there's a good reason to prefer doing it in planning instead of sequencing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I thought a specific validation step in planning would be more clear, but I think putting it in sequencing is easier and fits with the status quo so I removed the new validation logic and will add the check in sequencing in a follow up PR.

Note: It's not in this PR yet because the VERSION constraint gets rejected in planning as unsupported, I updated the SLT test to reflect that

@ParkMyCar ParkMyCar enabled auto-merge (squash) September 20, 2024 17:23
@ParkMyCar ParkMyCar merged commit b9c9f4b into MaterializeInc:main Sep 20, 2024
82 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants