Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report Prism parse errors #319

Open
wants to merge 1 commit into
base: prism
Choose a base branch
from
Open

Report Prism parse errors #319

wants to merge 1 commit into from

Conversation

st0012
Copy link
Member

@st0012 st0012 commented Nov 4, 2024

Motivation

Part of #309

After #318, I found that we should pass Prism errors translator instead of just relying on PM_MISSING_NODE because:

  • To generate error: Hint: ... like Sorbet's parser does, we need to have access to more context
  • In some errored cases, the current translator still generates different parse-tree than Sorbet's parser. To solve this we need a translator-level context to update the translation in those cases

So in this PR, I start to feed Prism errors to the translator and report them when we hit a PM_MISSING_NODE. This prevents runtime(-ish?) errors like dynamic constant assignment error being reported proactively. We will revise this in later PRs when we understand better around error translation.

Test plan

See included automated tests.

@st0012 st0012 self-assigned this Nov 4, 2024

while (node != nullptr) {
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
pm_error_level_t level = static_cast<pm_error_level_t>(error->level);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amomchilov I think pm_error_level_t is just uint8 enums (source). Can I do this cast if that's the case?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is error->level before the cast?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different for warning_list, but yeah, this should be safe here.

Copy link

@amomchilov amomchilov Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pm_error_level_t is just uint8 enums

Not quite. C enums act like a syntactic sugar over defining int constants. e.g. sizef(PM_ERROR_LEVEL_SYNTAX) is 4 bytes, just like sizeof(int) on ARM64).

Just like normal ints in C, they can be converted to and from other integer types. pm_diagnostic_level() casted them to uint8_t before storing them in the pm_diagnostic_t struct. This is likely just a performance optimization to make the struct smaller.

Can I do this cast if that's the case?

Yep! You're effectively just decompressing it back into its original form.


while (node != nullptr) {
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
pm_error_level_t level = static_cast<pm_error_level_t>(error->level);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is error->level before the cast?

@@ -146,7 +146,8 @@ unique_ptr<parser::Node> runPrismParser(core::GlobalState &gs, core::FileRef fil
return std::unique_ptr<parser::Node>();
}

auto nodes = Prism::Translator(parser, gs, file).translate(std::move(root));
auto errors = parser.errors();
auto nodes = Prism::Translator(parser, errors, gs, file).translate(std::move(root));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may have already considered this -- if we now have a method that exposes the errors on the parser, do we have to pass them into the translator? Would there be a way to fetch them from the parser when we need them?

Comment on lines +1210 to +1212
// For now, we only report errors when we hit a missing node because we don't want to always report dynamic
// constant assignment errors
// TODO: We will improve this in the future when we handle more errored cases
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this? What is the alternative to reporting errors only when we hit a missing node?


while (node != nullptr) {
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
pm_error_level_t level = static_cast<pm_error_level_t>(error->level);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different for warning_list, but yeah, this should be safe here.

@@ -28,4 +28,22 @@ std::string_view Parser::extractString(pm_string_t *string) {
return std::string_view(reinterpret_cast<const char *>(pm_string_source(string)), pm_string_length(string));
}

std::vector<ParseError> Parser::errors() const {
std::vector<ParseError> errors;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<ParseError> errors;
std::vector<ParseError> errors;
errors.reserve(storage->parser.error_list.size)

Comment on lines +34 to +45
pm_list_node_t *node = storage->parser.error_list.head;

while (node != nullptr) {
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
pm_error_level_t level = static_cast<pm_error_level_t>(error->level);

ParseError parseError(pm_diagnostic_id_human(error->diag_id),
std::string(reinterpret_cast<const char *>(error->message)), error->location, level);

errors.push_back(parseError);
node = node->next;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a for loop here:

Suggested change
pm_list_node_t *node = storage->parser.error_list.head;
while (node != nullptr) {
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
pm_error_level_t level = static_cast<pm_error_level_t>(error->level);
ParseError parseError(pm_diagnostic_id_human(error->diag_id),
std::string(reinterpret_cast<const char *>(error->message)), error->location, level);
errors.push_back(parseError);
node = node->next;
}
auto error_list = storage->parser.error_list
for (pm_list_node_t *node = error_list.head; node != nullptr; node = node->next) {
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node);
pm_error_level_t level = static_cast<pm_error_level_t>(error->level);
ParseError parseError(pm_diagnostic_id_human(error->diag_id),
std::string(reinterpret_cast<const char *>(error->message)), error->location, level);
errors.push_back(parseError);
}

pm_error_level_t level = static_cast<pm_error_level_t>(error->level);

ParseError parseError(pm_diagnostic_id_human(error->diag_id),
std::string(reinterpret_cast<const char *>(error->message)), error->location, level);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, we're making a copy of error->message, which will be owned by the ParseError.

I confirmed that we don't own the diagnostics nor their messages. They all get cleaned by pm_parser_free(), which calls pm_diagnostic_list_free().

@@ -1759,7 +1764,7 @@ template <typename PrismNode> std::unique_ptr<parser::Mlhs> Translator::translat
// Context management methods
Translator Translator::enterMethodDef() {
auto isInMethodDef = true;
return Translator(parser, gs, file, isInMethodDef);
return Translator(parser, parseErrors, gs, file, isInMethodDef);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translator's constructors takes the std::vector by-value, so the (and all its members) will be copied when they're passed here.

This wouldn't have been an issue back when we only constructed one translator (like before this method existed), but now we'll need to construct a new translator every time we change contexts (so far when we enter methods, but there would be more context-changes one we start removing our translation layer).

This is one of those times when there's no universally correct alternative, because it'll come down to how and where we end up needing to use the errors.

So far, the only way this vector ends up being used is when it gets iterated on line 1214.

Here's 2 ideas:

  1. We could store the vector in the Prism::ParserStorage (which is a ref-counted heap-allocated box that stores the "guts" of a Prism::Parser, and expose it via an API on Prism::Parser.
  2. We could get rid of it for now, and just access the parser.error_list and walk that linked list right on line 1210.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants