alter_table: Support parsing `VERSION` and `ADDED` keywords #29647

ParkMyCar · 2024-09-18T22:50:36Z

This PR adds support for parsing the VERSION and ADDED keywords ultimately in support of adding columns to tables.

These new keywords will appear in table definitions and item references, i.e.

CREATE TABLE t1 (a int, b text VERSION ADDED 1)
CREATE VIEW v1 AS SELECT * FROM [u1 AS "materialize"."public"."t1" VERSION 5]

This way we know at what version of a table a column was added and what version of table is referenced in a downstream object, e.g. a VIEW.

We prevent users from manually versioning columns and tables by introducing a statement validation step before planning. If the PlanContext indicates we're planning in the context of a user query, we bail if we find a versioned object. Also included is a dyncfg to disable validation as a CYA. Further there is a small SLT to exercise these scenarios.

Motivation

Progress towards https://github.com/MaterializeInc/database-issues/issues/8233

Tips for reviewer

This PR is split into separate commits which might make it easier to review:

Parser changes to support the new keyword and include it in the relevant types
Adding statement validation and pluming around the PlanContext a bit more
New sqllogictest to exercise this behavior

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

* support new 'VERSION' and 'ADDED' keywords in the parser * add a 'version' field to ColumnOptions and ItemName

* track if planning is happening in the context of a user query * add statement validation to check if a user manually typed 'VERSION' * include a dyncfg to CYA in case something goes wrong

shepherdlybot · 2024-09-19T20:10:29Z

Mitigations

Completing required mitigations increases Resilience Coverage.

Risk Summary:

This pull request carries a high risk score of 76, primarily driven by predictors such as File Diffusion and Delta of Executable Lines, and includes modifications to one file hotspot. Historically, pull requests with these predictors are 78% more likely to cause a bug compared to the repository baseline. Notably, the observed bug trend in the repository is decreasing.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../sequencer/inner.rs	96

jkosh44

LGTM

jkosh44 · 2024-09-20T12:58:39Z

src/sql-parser/src/parser.rs

+                self.prev_token();
+                self.expected(
+                    self.peek_pos(),
+                    "non-negative version number",


I think technically the parsing can also fail if the number is too large? Maybe something like "unsigned 64-bit version number" is more accurate.

Good call! It turns out we have a parse_literal_uint(...) method, I updated to use that which handles all of these cases

jkosh44 · 2024-09-20T12:59:39Z

src/sql-parser/src/parser.rs

@@ -6101,6 +6107,24 @@ impl<'a> Parser<'a> {
        }
    }

+    fn parse_version(&mut self) -> Result<Version, ParserError> {
+        let Value::Number(val) = self.parse_number_value()? else {
+            unreachable!("programming error, expected Value::Number")


What makes this unreachable? What if someone types version information into their DDL by mistake?

parse_number_value(...) returns a Value::Number, so anything else should be impossible. I realized we have a better parse_literal_uint(...) method though so I switched to that

jkosh44 · 2024-09-20T13:13:26Z

src/sql-parser/tests/testdata/create

Can we add one or two negative tests like ... VERSION nonsense ... or ... VERSION ADDED -10...

jkosh44 · 2024-09-20T13:13:45Z

src/sql-parser/tests/testdata/id

+
+parse-statement
+CREATE VIEW v1 AS SELECT * FROM [ u1 as materialize.public.t1 VERSION 5]
+----
+CREATE VIEW v1 AS SELECT * FROM [u1 AS materialize.public.t1 VERSION 5]
+=>
+CreateView(CreateViewStatement { if_exists: Error, temporary: false, definition: ViewDefinition { name: UnresolvedItemName([Ident("v1")]), columns: [], query: Query { ctes: Simple([]), body: Select(Select { distinct: None, projection: [Wildcard], from: [TableWithJoins { relation: Table { name: Id("u1", UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("t1")]), Some(Version(5))), alias: None }, joins: [] }], selection: None, group_by: [], having: None, options: [] }), order_by: [], limit: None, offset: None } } })
+
+parse-statement
+CREATE VIEW "materialize"."public"."v3" AS SELECT * FROM [u1 AS "materialize"."public"."t1" VERSION 3]
+----
+CREATE VIEW materialize.public.v3 AS SELECT * FROM [u1 AS materialize.public.t1 VERSION 3]
+=>
+CreateView(CreateViewStatement { if_exists: Error, temporary: false, definition: ViewDefinition { name: UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("v3")]), columns: [], query: Query { ctes: Simple([]), body: Select(Select { distinct: None, projection: [Wildcard], from: [TableWithJoins { relation: Table { name: Id("u1", UnresolvedItemName([Ident("materialize"), Ident("public"), Ident("t1")]), Some(Version(3))), alias: None }, joins: [] }], selection: None, group_by: [], having: None, options: [] }), order_by: [], limit: None, offset: None } } })


Ditto on the negative tests.

jkosh44 · 2024-09-20T13:23:32Z

src/sql/src/plan/statement.rs

+/// Validates this statement can be planned in the provided context.
+fn validate_statement<'a>(
+    stmt: &'a Statement<Aug>,
+    ctx: &'a StatementContext<'a>,
+) -> Result<(), PlanError> {
+    let mut validator = StatementValidator::new(ctx);
+    // Recursively visits the different parts of the Statement.
+    validator.visit_statement(stmt);
+
+    // Statement validation has the possibility of causing sticky panics if
+    // something fails when openning the catalog, so we give ourselves an out.
+    if SKIP_STATEMENT_VALIDATION.get(ctx.catalog.system_vars().dyncfgs()) {
+        if validator.state.is_err() {
+            tracing::warn!(
+                error = ?validator.state,
+                "skipping statement validation when there was an error"
+            );
+        }
+        validator.state = Ok(());
+    }
+
+    // Return the state the validator was left in.
+    validator.state
+}


FWIW, the way that we've handled this in the past is to due the validation in sequencing (for example in sequence_create_table). Generally an item only goes through sequencing when it's first created so we know that all user typed DDL will go through sequencing, but other DDL will not.

Here's a recent example:

materialize/src/adapter/src/coord/sequencer/inner.rs

Lines 660 to 678 in 8d1128a

match &plan.connection.details {

ConnectionDetails::Ssh { key_1, key_2, .. } => {

let key_1 = match key_1.as_key_pair() {

Some(key_1) => key_1.clone(),

None => {

return ctx.retire(Err(AdapterError::Unstructured(anyhow!(

"the PUBLIC KEY 1 option cannot be explicitly specified"

))))

}

};

let key_2 = match key_2.as_key_pair() {

Some(key_2) => key_2.clone(),

None => {

return ctx.retire(Err(AdapterError::Unstructured(anyhow!(

"the PUBLIC KEY 2 option cannot be explicitly specified"

))))

}

};

I'm not necessarily saying you should remove this validation, but I'm just wondering if there's a good reason to prefer doing it in planning instead of sequencing?

Originally I thought a specific validation step in planning would be more clear, but I think putting it in sequencing is easier and fits with the status quo so I removed the new validation logic and will add the check in sequencing in a follow up PR.

Note: It's not in this PR yet because the VERSION constraint gets rejected in planning as unsupported, I updated the SLT test to reflect that

This reverts commit bf31327.

ParkMyCar added 2 commits September 19, 2024 15:43

start, support parsing versions

9c499a7

* support new 'VERSION' and 'ADDED' keywords in the parser * add a 'version' field to ColumnOptions and ItemName

prevent users from typing 'VERSION'

bf31327

* track if planning is happening in the context of a user query * add statement validation to check if a user manually typed 'VERSION' * include a dyncfg to CYA in case something goes wrong

ParkMyCar force-pushed the adapter/table-column-versions branch from 665342f to aaad6c6 Compare September 19, 2024 19:45

add sqllogictest to exercise statement validation

c6df4b6

ParkMyCar force-pushed the adapter/table-column-versions branch from aaad6c6 to c6df4b6 Compare September 19, 2024 20:01

bin/fmt

bf72eab

ParkMyCar marked this pull request as ready for review September 19, 2024 20:09

ParkMyCar requested review from a team as code owners September 19, 2024 20:09

ParkMyCar requested a review from jkosh44 September 19, 2024 20:09

ParkMyCar changed the title ~~[WIP] alter_table: Support parsing VERSION and ADDED keywords~~ alter_table: Support parsing VERSION and ADDED keywords Sep 19, 2024

jkosh44 approved these changes Sep 20, 2024

View reviewed changes

ParkMyCar added 4 commits September 20, 2024 11:33

* update sql-parser logic and add more negative tests

986e97f

Revert "prevent users from typing 'VERSION'"

c2f4384

This reverts commit bf31327.

update alter-table slt test with new error

2e3fd2c

add more negative tests

d002d47

ParkMyCar enabled auto-merge (squash) September 20, 2024 17:23

specify reset-server

54bf0d7

ParkMyCar merged commit b9c9f4b into MaterializeInc:main Sep 20, 2024
82 checks passed

github-actions bot locked and limited conversation to collaborators Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alter_table: Support parsing `VERSION` and `ADDED` keywords #29647

alter_table: Support parsing `VERSION` and `ADDED` keywords #29647

ParkMyCar commented Sep 18, 2024 •

edited

Loading

shepherdlybot bot commented Sep 19, 2024 •

edited

Loading

jkosh44 left a comment

jkosh44 Sep 20, 2024

ParkMyCar Sep 20, 2024

jkosh44 Sep 20, 2024

ParkMyCar Sep 20, 2024

jkosh44 Sep 20, 2024

ParkMyCar Sep 20, 2024

jkosh44 Sep 20, 2024

ParkMyCar Sep 20, 2024

jkosh44 Sep 20, 2024

jkosh44 Sep 20, 2024

ParkMyCar Sep 20, 2024

	match &plan.connection.details {
	ConnectionDetails::Ssh { key_1, key_2, .. } => {
	let key_1 = match key_1.as_key_pair() {
	Some(key_1) => key_1.clone(),
	None => {
	return ctx.retire(Err(AdapterError::Unstructured(anyhow!(
	"the PUBLIC KEY 1 option cannot be explicitly specified"
	))))
	}
	};

	let key_2 = match key_2.as_key_pair() {
	Some(key_2) => key_2.clone(),
	None => {
	return ctx.retire(Err(AdapterError::Unstructured(anyhow!(
	"the PUBLIC KEY 2 option cannot be explicitly specified"
	))))
	}
	};

alter_table: Support parsing VERSION and ADDED keywords #29647

alter_table: Support parsing VERSION and ADDED keywords #29647

Conversation

ParkMyCar commented Sep 18, 2024 • edited Loading

Motivation

Tips for reviewer

Checklist

shepherdlybot bot commented Sep 19, 2024 • edited Loading

Mitigations

jkosh44 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alter_table: Support parsing `VERSION` and `ADDED` keywords #29647

alter_table: Support parsing `VERSION` and `ADDED` keywords #29647

ParkMyCar commented Sep 18, 2024 •

edited

Loading

shepherdlybot bot commented Sep 19, 2024 •

edited

Loading