Report Prism parse errors #319

st0012 · 2024-11-04T20:22:37Z

Motivation

Part of #309

After #318, I found that we should pass Prism errors translator instead of just relying on PM_MISSING_NODE because:

To generate error: Hint: ... like Sorbet's parser does, we need to have access to more context
In some errored cases, the current translator still generates different parse-tree than Sorbet's parser. To solve this we need a translator-level context to update the translation in those cases

So in this PR, I start to feed Prism errors to the translator and report them when we hit a PM_MISSING_NODE. This prevents runtime(-ish?) errors like dynamic constant assignment error being reported proactively. We will revise this in later PRs when we understand better around error translation.

Test plan

See included automated tests.

st0012 · 2024-11-04T20:27:29Z

parser/prism/Parser.cc

+
+    while (node != nullptr) {
+        pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
+        pm_error_level_t level = static_cast<pm_error_level_t>(error->level);


@amomchilov I think pm_error_level_t is just uint8 enums (source). Can I do this cast if that's the case?

What is error->level before the cast?

It's different for warning_list, but yeah, this should be safe here.

I think pm_error_level_t is just uint8 enums

Not quite. C enums act like a syntactic sugar over defining int constants. e.g. sizef(PM_ERROR_LEVEL_SYNTAX) is 4 bytes, just like sizeof(int) on ARM64).

Just like normal ints in C, they can be converted to and from other integer types. pm_diagnostic_level() casted them to uint8_t before storing them in the pm_diagnostic_t struct. This is likely just a performance optimization to make the struct smaller.

Can I do this cast if that's the case?

Yep! You're effectively just decompressing it back into its original form.

egiurleo · 2024-11-04T21:44:40Z

parser/prism/Parser.cc

+
+    while (node != nullptr) {
+        pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
+        pm_error_level_t level = static_cast<pm_error_level_t>(error->level);


What is error->level before the cast?

egiurleo · 2024-11-04T21:53:34Z

main/pipeline/pipeline.cc

@@ -146,7 +146,8 @@ unique_ptr<parser::Node> runPrismParser(core::GlobalState &gs, core::FileRef fil
        return std::unique_ptr<parser::Node>();
    }

-    auto nodes = Prism::Translator(parser, gs, file).translate(std::move(root));
+    auto errors = parser.errors();
+    auto nodes = Prism::Translator(parser, errors, gs, file).translate(std::move(root));


You may have already considered this -- if we now have a method that exposes the errors on the parser, do we have to pass them into the translator? Would there be a way to fetch them from the parser when we need them?

egiurleo · 2024-11-04T21:54:50Z

parser/prism/Translator.cc

+            // For now, we only report errors when we hit a missing node because we don't want to always report dynamic
+            // constant assignment errors
+            // TODO: We will improve this in the future when we handle more errored cases


Can you expand on this? What is the alternative to reporting errors only when we hit a missing node?

amomchilov · 2024-11-04T22:42:14Z

parser/prism/Parser.cc

+
+    while (node != nullptr) {
+        pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
+        pm_error_level_t level = static_cast<pm_error_level_t>(error->level);


It's different for warning_list, but yeah, this should be safe here.

amomchilov · 2024-11-04T22:43:05Z

parser/prism/Parser.cc

@@ -28,4 +28,22 @@ std::string_view Parser::extractString(pm_string_t *string) {
    return std::string_view(reinterpret_cast<const char *>(pm_string_source(string)), pm_string_length(string));
 }

+std::vector<ParseError> Parser::errors() const {
+    std::vector<ParseError> errors;


Suggested change

std::vector<ParseError> errors;

std::vector<ParseError> errors;

errors.reserve(storage->parser.error_list.size)

amomchilov · 2024-11-04T22:45:46Z

parser/prism/Parser.cc

+    pm_list_node_t *node = storage->parser.error_list.head;
+
+    while (node != nullptr) {
+        pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
+        pm_error_level_t level = static_cast<pm_error_level_t>(error->level);
+
+        ParseError parseError(pm_diagnostic_id_human(error->diag_id),
+                              std::string(reinterpret_cast<const char *>(error->message)), error->location, level);
+
+        errors.push_back(parseError);
+        node = node->next;
+    }


We can use a for loop here:

Suggested change

pm_list_node_t *node = storage->parser.error_list.head;

while (node != nullptr) {

pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);

pm_error_level_t level = static_cast<pm_error_level_t>(error->level);

ParseError parseError(pm_diagnostic_id_human(error->diag_id),

std::string(reinterpret_cast<const char *>(error->message)), error->location, level);

errors.push_back(parseError);

node = node->next;

}

auto error_list = storage->parser.error_list

for (pm_list_node_t *node = error_list.head; node != nullptr; node = node->next) {

pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);

pm_error_level_t level = static_cast<pm_error_level_t>(error->level);

ParseError parseError(pm_diagnostic_id_human(error->diag_id),

std::string(reinterpret_cast<const char *>(error->message)), error->location, level);

errors.push_back(parseError);

}

amomchilov · 2024-11-04T22:51:06Z

parser/prism/Parser.cc

+        pm_error_level_t level = static_cast<pm_error_level_t>(error->level);
+
+        ParseError parseError(pm_diagnostic_id_human(error->diag_id),
+                              std::string(reinterpret_cast<const char *>(error->message)), error->location, level);


This looks good, we're making a copy of error->message, which will be owned by the ParseError.

I confirmed that we don't own the diagnostics nor their messages. They all get cleaned by pm_parser_free(), which calls pm_diagnostic_list_free().

amomchilov · 2024-11-04T23:21:34Z

parser/prism/Translator.cc

@@ -1759,7 +1764,7 @@ template <typename PrismNode> std::unique_ptr<parser::Mlhs> Translator::translat
 // Context management methods
 Translator Translator::enterMethodDef() {
    auto isInMethodDef = true;
-    return Translator(parser, gs, file, isInMethodDef);
+    return Translator(parser, parseErrors, gs, file, isInMethodDef);


Translator's constructors takes the std::vector by-value, so the (and all its members) will be copied when they're passed here.

This wouldn't have been an issue back when we only constructed one translator (like before this method existed), but now we'll need to construct a new translator every time we change contexts (so far when we enter methods, but there would be more context-changes one we start removing our translation layer).

This is one of those times when there's no universally correct alternative, because it'll come down to how and where we end up needing to use the errors.

So far, the only way this vector ends up being used is when it gets iterated on line 1214.

Here's 2 ideas:

We could store the vector in the Prism::ParserStorage (which is a ref-counted heap-allocated box that stores the "guts" of a Prism::Parser, and expose it via an API on Prism::Parser.

We could get rid of it for now, and just access the parser.error_list and walk that linked list right on line 1210.

st0012 self-assigned this Nov 4, 2024

st0012 requested review from amomchilov and egiurleo November 4, 2024 20:22

Report Prism parse errors

3612c21

st0012 force-pushed the store-prism-errors branch from 5b27c72 to 3612c21 Compare November 4, 2024 20:25

st0012 commented Nov 4, 2024

View reviewed changes

egiurleo reviewed Nov 4, 2024

View reviewed changes

amomchilov reviewed Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report Prism parse errors #319

Report Prism parse errors #319

st0012 commented Nov 4, 2024

st0012 Nov 4, 2024

egiurleo Nov 4, 2024

amomchilov Nov 4, 2024

amomchilov Nov 4, 2024 •

edited

Loading

egiurleo Nov 4, 2024

egiurleo Nov 4, 2024

egiurleo Nov 4, 2024

amomchilov Nov 4, 2024

amomchilov Nov 4, 2024

amomchilov Nov 4, 2024

amomchilov Nov 4, 2024

amomchilov Nov 4, 2024

	std::vector<ParseError> errors;
	std::vector<ParseError> errors;
	errors.reserve(storage->parser.error_list.size)

Report Prism parse errors #319

Are you sure you want to change the base?

Report Prism parse errors #319

Conversation

st0012 commented Nov 4, 2024

Motivation

Test plan

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amomchilov Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amomchilov Nov 4, 2024 •

edited

Loading