Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep track of the longest failed match to improve error messages #269

Merged
merged 6 commits into from
Feb 26, 2020

Conversation

skoppe
Copy link
Contributor

@skoppe skoppe commented Jan 19, 2020

Here I am introducing the failEnd and failedChild members of the ParseTree.

Currently pegged discards the longest failed match in the option?, the zeroOrMore* or the oneOrMore+ operators.

Let me illustrate:

Thing <- (A B)* C
A <- "abc"
B <- "def"
C <- "ghi"

Let us run this grammer on the following input: "abc". It will fail of course and the error message will be expected "ghi" but found end of line. This is silly because A did actually correctly match (except it is discarded) and the error ought to be expected "def" but found end of line.

This also occurs with the (A B)? or the (A B)+ (after the first match).

Essentially the option, zeroOrMore and the oneOrMore rules eat errors, and that can be problematic if during those rules it actually parsed the longest failing match.

There are also more complex causes where some nested rule is correctly parsed, but it could contain a longer failed match, so it is not just about rules that are not successful. E.g.

Thing <- AB C
AB <- A B?
A <- "abc"
B <- "def"
C <- "ghi"

With input "abc de" it will complain about expected "ghi" but got "de". Of course rule B? is matched longer, and it should complain about that.

This PR introduces additional logic to keep track of the longest failed match.

It is not in a perfect stage (it still needs tests), but I wanted to submit it already to get some feedback.

Copy link
Collaborator

@veelo veelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this much requested improvement, Sebastiaan. And welcome to the Pegged club!

pegged/peg.d Outdated
@@ -1333,6 +1348,29 @@ template and(rules...) if (rules.length > 0)
}
}

// This function rewrites the ParseTree to move the failedChild into its parents children
// this is done whenever an 'and' rule fails and there are such a child further down
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"there are such a child" = "there is such a child"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a nested function inside and(), to clarify that it is only called from there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially misunderstood what this function does. The way this is documented, it seemed like a descendant was being pulled up the ancestry tree to be inserted higher up. That would be very strange, of course.

pegged/peg.d Outdated
auto firstLongestFailedMatch = result.children.countUntil!(c => c.failEnd > temp.end);
if (firstLongestFailedMatch == -1) {
result.children ~= temp;// add the failed node, to indicate which failed
// result.end = temp.end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you don't mean to commit this line that is commented out.

pegged/peg.d Outdated
} else {
// don't add the failed node because a previous one already failed further back
result.children = result.children[0 .. firstLongestFailedMatch+1]; // discard any intermediate correct nodes
failedChildFixup(result.children[firstLongestFailedMatch]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain what needs fixing up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is clearer now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@skoppe
Copy link
Contributor Author

skoppe commented Jan 21, 2020

@veelo Thanks for taking the time to review. I have addressed your points. I will add some unittests (hopefully later today).

pegged/peg.d Outdated
} else {
// don't add the failed node because a previous one already failed further back
result.children = result.children[0 .. firstLongestFailedMatch+1]; // discard any intermediate correct nodes
failedChildFixup(result.children[firstLongestFailedMatch]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@skoppe skoppe changed the title keep track of the longest failed match to improve error messages WIP: keep track of the longest failed match to improve error messages Jan 29, 2020
@skoppe
Copy link
Contributor Author

skoppe commented Jan 29, 2020

I just found out that the fixup interferes with the memoization. No solution as of yet.

@veelo
Copy link
Collaborator

veelo commented Jan 29, 2020

Right, that can be tricky. Good you discovered that.

Philippe just made me a repository collaborator, so I’ll be able to merge once the PR is ready.

@skoppe
Copy link
Contributor Author

skoppe commented Jan 30, 2020

Great. I think I have found the culprit but will first test on our codebase. If this gets merged I suggest to make a beta tag and after some testing at least a minor version bump. The ParseTree will be different under invalid input, and people's current error handling might not expect that.

- note how much of a literal is parsed when failed
- only do fixup for children which match failEnd
- recalculate failEnd in fixup
- have the `or` template return a list of failed children instead of last one
- propagate failEnd better in `oneOrMore` and `option`
- keep failedChild in decimateTree
- wanted to iterate byGrapheme, but that won't work in ctfe
- decimateTree should keep nodes with no matches when the root
parsetree has failed (there may be failedChild elements in there
useful for error reporting)
@veelo
Copy link
Collaborator

veelo commented Feb 25, 2020

Are you ready for me to tag a beta release with this?

@skoppe skoppe changed the title WIP: keep track of the longest failed match to improve error messages keep track of the longest failed match to improve error messages Feb 26, 2020
@skoppe
Copy link
Contributor Author

skoppe commented Feb 26, 2020

Yes, please do, last couple of weeks everything was stable here.

@veelo veelo merged commit b1c5adf into dlang-community:master Feb 26, 2020
@veelo
Copy link
Collaborator

veelo commented Feb 26, 2020

@skoppe Any idea what could be the cause of the spike in compiler memory consumption? See #272.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants