Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output parse errors for the Rust part of the build step #291

Merged
merged 6 commits into from
Apr 25, 2024
Merged

Conversation

domenic
Copy link
Member

@domenic domenic commented Feb 9, 2024

This fixes #290, by outputting the parse errors encountered by the Rust build step's parser. Previously they were being stored in the RcDom instance's errors vector, and ignored. Now they are threaded through to the final io::Result, and then output by main().

The hardest part of this was adding line numbers to the errors. Doing this necessitated creating a wrapper for RcDom, which I inventively called RcDomWithLineNumbers, which implements TreeSink with two methods parse_error and set_current_line given custom behavior, while the other many methods just delegate to RcDom's implementation.

(There doesn't appear to be any better way to do this in Rust, currently. If you search for "rust delegation" you'll find lots of discussion of the problem, and various solutions which work if you can modify the delegated-to class, i.e. RcDom in our case. But nothing seems to work if the delegated-to class is in third-party code.)

Additionally, this enables exact_errors as a parser option, which provides slightly more information in a couple of cases related to character references.

Original PR description

I spent some time trying to address #290

This sort of works. (Although the code needs a lot of cleanup; probably we want to bubble the errors to the top level instead of printing them immediately.)

However, the biggest problem is that there is no line number information. So the output is terrible:

   Compiling html-build v0.0.0 (/home/domenic/src/html-build)
    Finished release [optimized] target(s) in 3.14s
     Running `target/release/html-build`
Attributes on an end tag
No matching tag to close
Error: Custom { kind: InvalidData, error: "Parse errors encountered" }

The upstream issue is servo/html5ever#48 .

I think this is possible to fix if we do something where we override the RcDom sink so that instead of doing nothing when set_current_line is called, it stores that information somewhere. And then I guess we'd override the parse_error implementation to use that stored information.

Figuring out how to create some version of RcDom with these overrides seems likely to be something Rust supports, but I'm running out time for playing with Rust today. If anyone else wants to pitch in, patches are appreciated.

/cc @noamr @jeremyroman

@domenic
Copy link
Member Author

domenic commented Feb 9, 2024

Figuring out how to create some version of RcDom with these overrides seems likely to be something Rust supports

Not... really, it seems. https://chat.openai.com/share/fa746c1a-403f-4611-bce7-7277e21114d8

@jeremyroman
Copy link
Contributor

An alternative hacky option we could always have is to treat validation as a "lint" step as part of the GitHub actions, as opposed to a feature of the core build process itself. Then we don't need the core build's parser to track errors in detail. Obviously it's less elegant, but potentially an option. Or in between -- the core build simply tracks whether there were errors, and then we do a separate lint/diagnostic pass to describe the errors given there were some.

Either of these might be preferable to fully forking markup5ever_rcdom.

Copy link
Member Author

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wrapped everything, and it seems to mostly work... not pretty, but it seems like the way to do things in Rust.

// Expose out the document and errors from the inner RcDom
pub fn document(&self) -> &Handle {
&self.dom.document
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how this makes RcDomWithLineNumbers have a different public API than RcDom. Also sometimes I need to call clone() on the result of this, and sometimes not. Probably there is something better I could do here.

src/main.rs Show resolved Hide resolved
src/parser.rs Outdated Show resolved Hide resolved
@domenic domenic marked this pull request as ready for review February 15, 2024 05:59
@domenic domenic changed the title Hacky WIP on more parse errors Output parse errors for the Rust part of the build step Feb 15, 2024
But since it's a macro, `cargo fmt` can't get inside of it?
@domenic
Copy link
Member Author

domenic commented Feb 22, 2024

@jeremyroman @domfarolino for review if you all have time.

src/main.rs Show resolved Hide resolved
src/parser.rs Show resolved Hide resolved
@domenic domenic mentioned this pull request Apr 11, 2024
4 tasks
@domenic domenic merged commit 06326f7 into main Apr 25, 2024
1 check passed
@domenic domenic deleted the parsing-errors branch April 25, 2024 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Local build doesn't catch HTML parsing errors
3 participants