Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML code is removed #992

Open
BekaArabidze98 opened this issue Sep 13, 2024 · 4 comments
Open

XML code is removed #992

BekaArabidze98 opened this issue Sep 13, 2024 · 4 comments

Comments

@BekaArabidze98
Copy link

BekaArabidze98 commented Sep 13, 2024

This is more like a question or maybe a bug, I can't really tell

Background & Context

I want to parse some xml code. I checked cure53.de/purify and yes it does remove xml tags,
but I want them present in the final html

My config

DOMPurify.sanitize(html, {
    ADD_TAGS: ['iframe'],
    ADD_ATTR: ['allow', 'allowfullscreen', 'frameborder', 'scrolling', 'target'],
    SAFE_FOR_XML: false,
    // FORBID_TAGS: ['img'],
  });

Input

<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
 <glossary><title>example glossary</title>
  <GlossDiv><title>S</title>
   <GlossList>
    <GlossEntry ID="SGML" SortAs="SGML">
     <GlossTerm>Standard Generalized Markup Language</GlossTerm>
     <Acronym>SGML</Acronym>
     <Abbrev>ISO 8879:1986</Abbrev>
     <GlossDef>
      <para>A meta-markup language, used to create markup
languages such as DocBook.</para>
      <GlossSeeAlso OtherTerm="GML">
      <GlossSeeAlso OtherTerm="XML">
     </GlossDef>
     <GlossSee OtherTerm="markup">
    </GlossEntry>
   </GlossList>
  </GlossDiv>

<h2 class="slate-h2 "> myhtml</h2><div class="code-block-wrapper" style="width:60%"><div class="code-block-header">XML</div><pre class="slate-CodeBlockElement slate-code_block"><code class="language-xml keep-markup drop-tokens"><div class="slate-code_line"></div><div class="slate-code_line"> <glossary>example glossary</glossary></div><div class="slate-code_line"> <glossdiv><title>S</title></glossdiv></div><div class="slate-code_line"> <glosslist></glosslist></div><div class="slate-code_line"> <glossentry id="SGML" sortas="SGML"></glossentry></div><div class="slate-code_line"> <glossterm>Standard Generalized Markup Language</glossterm></div><div class="slate-code_line"> <acronym>SGML</acronym></div><div class="slate-code_line"> <abbrev>ISO 8879:1986</abbrev></div><div class="slate-code_line"> <glossdef></glossdef></div><div class="slate-code_line"> <para>A meta-markup language, used to create markup</para></div><div class="slate-code_line">languages such as DocBook.</div><div class="slate-code_line"> <glossseealso otherterm="GML"></glossseealso></div><div class="slate-code_line"> <glossseealso otherterm="XML"></glossseealso></div><div class="slate-code_line"> </div><div class="slate-code_line"> <glosssee otherterm="markup"></glosssee></div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div></code></pre></div><p class="slate-p "></p><p class="slate-p "></p><p class="slate-p "></p>

Given output

<h2 class="slate-h2">myhtml</h2><div style="width:60%" class="code-block-wrapper"><div class="code-block-header">XML</div><pre class="slate-CodeBlockElement slate-code_block"><code class="language-xml keep-markup drop-tokens"><div class="slate-code_line"></div><div class="slate-code_line"> example glossary</div><div class="slate-code_line"> <title>S</title></div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> Standard Generalized Markup Language</div><div class="slate-code_line"> <acronym>SGML</acronym></div><div class="slate-code_line"> ISO 8879:1986</div><div class="slate-code_line"> </div><div class="slate-code_line"> A meta-markup language, used to create markup</div><div class="slate-code_line">languages such as DocBook.</div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div><div class="slate-code_line"> </div></code></pre></div><p class="slate-p"></p><p class="slate-p"></p><p class="slate-p"></p>

Expected output

Expected output is to not remove xml tags

I posted this in the isomorphic-dompurify. I don't know If this still maintained
Thanks

@cure53
Copy link
Owner

cure53 commented Sep 16, 2024

Hey there :) Have you tried to modify the PARSER_MEDIA_TYPE property? That should do the trick.

https://github.com/cure53/DOMPurify?tab=readme-ov-file#influence-how-we-sanitize

@BekaArabidze98
Copy link
Author

BekaArabidze98 commented Sep 16, 2024

Thanks for the response. I will try,

@BekaArabidze98
Copy link
Author

It did not work. I think something wrong with the html. Those xml tags are not treaded as tags, or something, are being removed

@cure53
Copy link
Owner

cure53 commented Sep 17, 2024

You need to of course also allow-list the tags you want to keep. By default, DOMPurify will remove everything it doesn't know and recognize as harmless

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants