You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The URL standard is getting more sophisticated about validation errors. See e.g. whatwg/url#406.
There is also the issue where the URL standard contains two different ways of checking URL validity, which are probably not equivalent; see some discussion in whatwg/url#437 (comment). I'll call these "parser validity" and "writing validity", since the latter is derived from the URL writing section.
It would be very neat if this package could provide:
Whether a URL is writing-valid. (At least for "absolute-URL-with-fragment string"; maybe relative URLs are out of scope.)
Whether a URL is parser-valid, including the name of any validation error(s) triggered during parsing.
Surface this information in the live URL viewer.
A test corpus of valid/invalid URLs, maybe drawn from urltestdata.json. We should at least have one parser-invalid URL for every named parser validation error, and it'd also be good to have writing-invalid URLs derived from systematically violating every rule in the writing URLs section.
Bonus: a fuzzer that generates strings and tests if they are parser-valid but writing-invalid, or parser-invalid but writing-valid.
Right now we have some not-very-well-maintained code to track validation errors when the spec says to do so. It does this by setting urlStateMachine.parseError to true. ("parse error" is the old name for "validation error".) We'd probably want to update this to an array of strings (representing validation error names) that get pushed. And we should probably expose it on the URL record; that's #61. Or maybe elsewhere, since if you get a parsing failure, there's no URL record, so any fatal validation errors would be invisible this way?
We have some known places where we miss validation errors that the spec asks about; I think most of them are due to not implementing "is a URL code point", which #59 tried to address, but got stalled.
One reason why I was ambivalent about #59 is that it's a good bit of code, and potentially performance cost, for validity checking. At least one major consumer of this project, jsdom, prefers speed, and has no use for validity checking. It would also add memory consumption for every URL record, especially if as per the above we go beyond the simple boolean to instead be an array of strings.
On the other hand, there aren't many other consumers of this project for "production" use. Everyone else should probably be using the built-in URL constructor in their environment (either Node or the browser). And we're not exactly tuned for speed anyway; there is probably lower-hanging fruit we're ignoring.
Use source code rewriting to prune out validity-related parts of the parser code, so that we end up with two output .js files, e.g. url-state-machine-with-validy-checking.js and url-state-machine.js. (Related: Remove the src/ vs. lib/ distinction #153, and maybe Benchmark whether the source-rewriting p() indirection actually buys us anything #155 because if we remove the p() source code rewriting then this would just be adding some back.) This would result in a larger npm package size but a smaller loaded-into-memory size.
The text was updated successfully, but these errors were encountered:
mattclough1
pushed a commit
to mattclough1/whatwg-url
that referenced
this issue
Oct 14, 2021
The URL standard is getting more sophisticated about validation errors. See e.g. whatwg/url#406.
There is also the issue where the URL standard contains two different ways of checking URL validity, which are probably not equivalent; see some discussion in whatwg/url#437 (comment). I'll call these "parser validity" and "writing validity", since the latter is derived from the URL writing section.
It would be very neat if this package could provide:
Right now we have some not-very-well-maintained code to track validation errors when the spec says to do so. It does this by setting
urlStateMachine.parseError
to true. ("parse error" is the old name for "validation error".) We'd probably want to update this to an array of strings (representing validation error names) that get pushed. And we should probably expose it on the URL record; that's #61. Or maybe elsewhere, since if you get a parsing failure, there's no URL record, so any fatal validation errors would be invisible this way?We have some known places where we miss validation errors that the spec asks about; I think most of them are due to not implementing "is a URL code point", which #59 tried to address, but got stalled.
One reason why I was ambivalent about #59 is that it's a good bit of code, and potentially performance cost, for validity checking. At least one major consumer of this project, jsdom, prefers speed, and has no use for validity checking. It would also add memory consumption for every URL record, especially if as per the above we go beyond the simple boolean to instead be an array of strings.
On the other hand, there aren't many other consumers of this project for "production" use. Everyone else should probably be using the built-in
URL
constructor in their environment (either Node or the browser). And we're not exactly tuned for speed anyway; there is probably lower-hanging fruit we're ignoring.We could address this in a number of ways, e.g.:
Take the code size hit, and benchmark the speed hit. (Probably after Benchmark whether the source-rewriting p() indirection actually buys us anything #155 gets us some basic benchmarks.) Maybe it's fine for jsdom's purposes.
Use source code rewriting to prune out validity-related parts of the parser code, so that we end up with two output
.js
files, e.g.url-state-machine-with-validy-checking.js
andurl-state-machine.js
. (Related: Remove the src/ vs. lib/ distinction #153, and maybe Benchmark whether the source-rewriting p() indirection actually buys us anything #155 because if we remove thep()
source code rewriting then this would just be adding some back.) This would result in a larger npm package size but a smaller loaded-into-memory size.The text was updated successfully, but these errors were encountered: