-
-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
¶ gets transformed to ¶ #995
Comments
Funnily enough, transformations in URLs is correct. Input<a href="https://example.com/?foo=bar¶=baz">https://example.com/?foo=bar¶=baz</a> Given output<a href="https://example.com/?foo=bar&para=baz">https://example.com/?foo=bar¶=baz</a> Expected output<a href="https://example.com/?foo=bar&para=baz">https://example.com/?foo=bar&para=baz</a> |
That is done by the browser or DOM engine DOMPurify uses, not by DOMPurify itself. Sadly, we cannot fix this as this is fully expected behavior and related how the browser deals with named HTML entities. |
The fact that an entity (without the trailing Can you please point me to where I should open a bug report? |
I think this is not a bug but specified behavior, see HTML spec. |
The HTML spec does not specify to render |
https://html.spec.whatwg.org/multipage/named-characters.html
Here it does :) |
The table you reference could be somewhat misleading. The spec is very clear that the sequence must be terminated by a semicolon character. https://html.spec.whatwg.org/multipage/syntax.html#syntax-charref
Therefore I maintain that this is a bug. |
Yes, but for legacy reasons, some entities work without as per spec - and para is one 🙂 No 🐞 |
You're right in pointing that browsers are required to support them in their rendering engines for legacy reasons – that said they're non-comforming (all named character references are required to end with a semicolon, and uses of named character references without a semicolon are flagged as errors.), and sadly it's overreaching into the transformed output of DOMpurify. Although this feels like an undesirable side-effect, I now understand better why you said it's intended as per the spec! |
Background & Context
I found a bug concerning how HTML entities like
¶
(without the trailing;
) are being handled during sanitization, especially regarding how the clean output reflects the intended display of entities like the paragraph symbol (¶
).Bug
Input
The input HTML thrown at DOMPurify:
Given output
The output given by DOMPurify:
Expected output
The expected output:
DOMPurify appears to be converting the
¶
entity into its equivalent Unicode symbol (¶
) in the cleaned HTML. However, the expectation is for the original HTML entity¶
to remain intact without being converted to the symbol.The text was updated successfully, but these errors were encountered: