-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report Prism parse errors #319
base: prism
Are you sure you want to change the base?
Conversation
5b27c72
to
3612c21
Compare
|
||
while (node != nullptr) { | ||
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node); | ||
pm_error_level_t level = static_cast<pm_error_level_t>(error->level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amomchilov I think pm_error_level_t
is just uint8
enums (source). Can I do this cast if that's the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is error->level
before the cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's different for warning_list
, but yeah, this should be safe here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
pm_error_level_t
is justuint8
enums
Not quite. C enums act like a syntactic sugar over defining int
constants. e.g. sizef(PM_ERROR_LEVEL_SYNTAX)
is 4 bytes, just like sizeof(int)
on ARM64).
Just like normal int
s in C, they can be converted to and from other integer types. pm_diagnostic_level()
casted them to uint8_t
before storing them in the pm_diagnostic_t
struct. This is likely just a performance optimization to make the struct smaller.
Can I do this cast if that's the case?
Yep! You're effectively just decompressing it back into its original form.
|
||
while (node != nullptr) { | ||
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node); | ||
pm_error_level_t level = static_cast<pm_error_level_t>(error->level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is error->level
before the cast?
@@ -146,7 +146,8 @@ unique_ptr<parser::Node> runPrismParser(core::GlobalState &gs, core::FileRef fil | |||
return std::unique_ptr<parser::Node>(); | |||
} | |||
|
|||
auto nodes = Prism::Translator(parser, gs, file).translate(std::move(root)); | |||
auto errors = parser.errors(); | |||
auto nodes = Prism::Translator(parser, errors, gs, file).translate(std::move(root)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may have already considered this -- if we now have a method that exposes the errors on the parser, do we have to pass them into the translator? Would there be a way to fetch them from the parser when we need them?
// For now, we only report errors when we hit a missing node because we don't want to always report dynamic | ||
// constant assignment errors | ||
// TODO: We will improve this in the future when we handle more errored cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on this? What is the alternative to reporting errors only when we hit a missing node?
|
||
while (node != nullptr) { | ||
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node); | ||
pm_error_level_t level = static_cast<pm_error_level_t>(error->level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's different for warning_list
, but yeah, this should be safe here.
@@ -28,4 +28,22 @@ std::string_view Parser::extractString(pm_string_t *string) { | |||
return std::string_view(reinterpret_cast<const char *>(pm_string_source(string)), pm_string_length(string)); | |||
} | |||
|
|||
std::vector<ParseError> Parser::errors() const { | |||
std::vector<ParseError> errors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<ParseError> errors; | |
std::vector<ParseError> errors; | |
errors.reserve(storage->parser.error_list.size) |
pm_list_node_t *node = storage->parser.error_list.head; | ||
|
||
while (node != nullptr) { | ||
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node); | ||
pm_error_level_t level = static_cast<pm_error_level_t>(error->level); | ||
|
||
ParseError parseError(pm_diagnostic_id_human(error->diag_id), | ||
std::string(reinterpret_cast<const char *>(error->message)), error->location, level); | ||
|
||
errors.push_back(parseError); | ||
node = node->next; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a for
loop here:
pm_list_node_t *node = storage->parser.error_list.head; | |
while (node != nullptr) { | |
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node); | |
pm_error_level_t level = static_cast<pm_error_level_t>(error->level); | |
ParseError parseError(pm_diagnostic_id_human(error->diag_id), | |
std::string(reinterpret_cast<const char *>(error->message)), error->location, level); | |
errors.push_back(parseError); | |
node = node->next; | |
} | |
auto error_list = storage->parser.error_list | |
for (pm_list_node_t *node = error_list.head; node != nullptr; node = node->next) { | |
pm_diagnostic_t *error = reinterpret_cast<pm_diagnostic_t *>(node); | |
pm_error_level_t level = static_cast<pm_error_level_t>(error->level); | |
ParseError parseError(pm_diagnostic_id_human(error->diag_id), | |
std::string(reinterpret_cast<const char *>(error->message)), error->location, level); | |
errors.push_back(parseError); | |
} |
pm_error_level_t level = static_cast<pm_error_level_t>(error->level); | ||
|
||
ParseError parseError(pm_diagnostic_id_human(error->diag_id), | ||
std::string(reinterpret_cast<const char *>(error->message)), error->location, level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, we're making a copy of error->message
, which will be owned by the ParseError
.
I confirmed that we don't own the diagnostics nor their messages. They all get cleaned by pm_parser_free()
, which calls pm_diagnostic_list_free()
.
@@ -1759,7 +1764,7 @@ template <typename PrismNode> std::unique_ptr<parser::Mlhs> Translator::translat | |||
// Context management methods | |||
Translator Translator::enterMethodDef() { | |||
auto isInMethodDef = true; | |||
return Translator(parser, gs, file, isInMethodDef); | |||
return Translator(parser, parseErrors, gs, file, isInMethodDef); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Translator
's constructors takes the std::vector
by-value, so the (and all its members) will be copied when they're passed here.
This wouldn't have been an issue back when we only constructed one translator (like before this method existed), but now we'll need to construct a new translator every time we change contexts (so far when we enter methods, but there would be more context-changes one we start removing our translation layer).
This is one of those times when there's no universally correct alternative, because it'll come down to how and where we end up needing to use the errors.
So far, the only way this vector ends up being used is when it gets iterated on line 1214.
Here's 2 ideas:
- We could store the vector in the
Prism::ParserStorage
(which is a ref-counted heap-allocated box that stores the "guts" of aPrism::Parser
, and expose it via an API onPrism::Parser
. - We could get rid of it for now, and just access the
parser.error_list
and walk that linked list right on line 1210.
Motivation
Part of #309
After #318, I found that we should pass Prism errors translator instead of just relying on
PM_MISSING_NODE
because:error: Hint: ...
like Sorbet's parser does, we need to have access to more contextSo in this PR, I start to feed Prism errors to the translator and report them when we hit a
PM_MISSING_NODE
. This prevents runtime(-ish?) errors like dynamic constant assignment error being reported proactively. We will revise this in later PRs when we understand better around error translation.Test plan
See included automated tests.