What to do about validation errors #156

domenic · 2020-05-14T15:26:53Z

The URL standard is getting more sophisticated about validation errors. See e.g. whatwg/url#406.

There is also the issue where the URL standard contains two different ways of checking URL validity, which are probably not equivalent; see some discussion in whatwg/url#437 (comment). I'll call these "parser validity" and "writing validity", since the latter is derived from the URL writing section.

It would be very neat if this package could provide:

Whether a URL is writing-valid. (At least for "absolute-URL-with-fragment string"; maybe relative URLs are out of scope.)
Whether a URL is parser-valid, including the name of any validation error(s) triggered during parsing.
Surface this information in the live URL viewer.
A test corpus of valid/invalid URLs, maybe drawn from urltestdata.json. We should at least have one parser-invalid URL for every named parser validation error, and it'd also be good to have writing-invalid URLs derived from systematically violating every rule in the writing URLs section.
Bonus: a fuzzer that generates strings and tests if they are parser-valid but writing-invalid, or parser-invalid but writing-valid.

Right now we have some not-very-well-maintained code to track validation errors when the spec says to do so. It does this by setting urlStateMachine.parseError to true. ("parse error" is the old name for "validation error".) We'd probably want to update this to an array of strings (representing validation error names) that get pushed. And we should probably expose it on the URL record; that's #61. Or maybe elsewhere, since if you get a parsing failure, there's no URL record, so any fatal validation errors would be invisible this way?

We have some known places where we miss validation errors that the spec asks about; I think most of them are due to not implementing "is a URL code point", which #59 tried to address, but got stalled.

One reason why I was ambivalent about #59 is that it's a good bit of code, and potentially performance cost, for validity checking. At least one major consumer of this project, jsdom, prefers speed, and has no use for validity checking. It would also add memory consumption for every URL record, especially if as per the above we go beyond the simple boolean to instead be an array of strings.

On the other hand, there aren't many other consumers of this project for "production" use. Everyone else should probably be using the built-in URL constructor in their environment (either Node or the browser). And we're not exactly tuned for speed anyway; there is probably lower-hanging fruit we're ignoring.

We could address this in a number of ways, e.g.:

Take the code size hit, and benchmark the speed hit. (Probably after Benchmark whether the source-rewriting p() indirection actually buys us anything #155 gets us some basic benchmarks.) Maybe it's fine for jsdom's purposes.
Use source code rewriting to prune out validity-related parts of the parser code, so that we end up with two output .js files, e.g. url-state-machine-with-validy-checking.js and url-state-machine.js. (Related: Remove the src/ vs. lib/ distinction #153, and maybe Benchmark whether the source-rewriting p() indirection actually buys us anything #155 because if we remove the p() source code rewriting then this would just be adding some back.) This would result in a larger npm package size but a smaller loaded-into-memory size.

The text was updated successfully, but these errors were encountered:

Escape text from custom transformTags functions.

mattclough1 pushed a commit to mattclough1/whatwg-url that referenced this issue Oct 14, 2021

Merge pull request jsdom#156 from mikesamuel/master

712cb68

Escape text from custom transformTags functions.

domenic mentioned this issue Sep 2, 2022

It is unlikely that the spec's two definitions of valid URL strings are equal whatwg/url#704

Open

karwa mentioned this issue Sep 26, 2023

Track specific validation errors #260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What to do about validation errors #156

What to do about validation errors #156

domenic commented May 14, 2020 •

edited

Loading

What to do about validation errors #156

What to do about validation errors #156

Comments

domenic commented May 14, 2020 • edited Loading

domenic commented May 14, 2020 •

edited

Loading