Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defaults for boolean SanitizerConfig items #254

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

otherdaniel
Copy link
Collaborator

@otherdaniel otherdaniel commented Jan 22, 2025

Give explicit defaults to SanitizerConfig.comments and SanitizerConfig.dataAttributes.

This fixes one of the issues in #249.


Preview | Diff

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach has some issues:

  1. If boolean fields have a default it should be false, not true.
  2. This is strictly less safe.

I think keeping them optional and using that third state to pick a default depending on which method you chose as I suggested in the issue is still a reasonable way forward.

If everyone really wants this behavior we'd have to name them removeComments and removeDataAttributes I think, defaulting to false.

@mozfreddyb
Copy link
Collaborator

I was also under the impression that we would make comments & data attributes allowable but not enabled in the setHTML() default.

I think we only discussed a webidl-semantics question on our call on Wednesday, so me skipping this may have contributed to that confusion?

I think we wanted {} to be an "empty sanitizer" but the default to be "a securely configured sanitizer". Is that part of this change?

@otherdaniel
Copy link
Collaborator Author

otherdaniel commented Jan 23, 2025

I seem to recall a quite explicit discussion about a true default in the meeting. But anyways, the observable behaviours are: (Result in the comments is what we have currently specified, as far as i understand.)

[Edit: Using real comment syntax now. :) ]

/* 1*/ new Sanitizer().get().comments;  // false
/* 2*/ new Sanitizer({}).get().comments;  // false
/* 3*/ new Sanitizer({comments: true}).get().comments;  // true
/* 4*/ new Sanitizer({comments: false}).get().comments;  // false

/* 5*/ div.setHTMLUnsafe("<!--bla-->");  // <div></div>
/* 6*/ div.setHTMLUnsafe("<!--bla-->",{sanitizer: {}});  // <div></div>
/* 7*/ div.setHTMLUnsafe("<!--bla-->",{sanitizer:{comments:true}});  // <div><!--bla--></div>
/* 8*/ div.setHTMLUnsafe("<!--bla-->",{sanitizer:{comments:false}});  // <div></div>
 
/* 9*/ div.setHTML("<!--bla-->");  // <div></div>
/*10*/ div.setHTML("<!--bla-->",{sanitizer: {}});  // <div></div>
/*11*/ div.setHTML("<!--bla-->",{sanitizer:{comments:true}});  // <div><!--bla--></div>
/*12*/ div.setHTML("<!--bla-->",{sanitizer:{comments:false}});  // <div></div>

I take it:

  • 5 is problematic, as it contradicts existing HTML-spec'ed behaviour of setHTMLUnsafe.
  • 5 + 6 are expected to be consistent, because setHTMLUnsafe defaults to {}.
  • 9 can be different from 10, due to the new "default" mechanism, but I take it 1 + 2 + 5 + 6 + 10 are meant to be consistent.
  • I take it safe/unsafe, i.e. 6-8 and 10-12, are expected to be consistent except for "baseline" behaviour (non-allowable + javascript:-URLs). Comments aren't a "baseline" thing, so I'd expect them to be fully consistent regarding comments.

I'm trying to reconcile these.

I'm guessing the explicit ones (3+4, 7+8, 11+12) are all uncontroversial. The rub is that setHTMLUnsafe already has behaviour. I think that behaviour allows comments. And then we have various requirements of methods to be consistent. In particular, I thought we wanted setHTML and setHTMLUnsafe to be consistent except for clearly articulated differences.

Maybe I'm overlooking something. I think there's other ways to fix this, too, e.g. not defaulting to {} for setHTMLUnsafe.

@annevk
Copy link
Collaborator

annevk commented Jan 23, 2025

Nit: you want <!-- bla --> instead of /* bla */.

I think your analysis is correct. I tend to think it's okay for Sanitizer, setHTML(), setHTMLUnsafe() to handle the dictionary members comments and dataAttributes differently, but only when they are not explicitly set. I do see that creates some inelegance in the overall model, but overall that seems preferable over not removing them in setHTML().

As such I think Sanitizer and setHTMLUnsafe() should default them to false, when they are not set. And setHTML() should default them to true.

This gives these results:

  1. false
  2. false
  3. true
  4. false
  5. false
  6. false
  7. true
  8. false
  9. true
  10. true
  11. true
  12. false

(This can only happen when you pass a SanitizerConfig. Once you have a Sanitizer they're always set of course (and they're no longer dictionary members).)

@otherdaniel
Copy link
Collaborator Author

Okay, that makes sense.

I do see that creates some inelegance in the overall model, but overall that seems preferable over not removing them in setHTML(). [...] And setHTML() should default them to true.

This seems contradictory, but I think it's just about notation. I take comments: true as allowing comments, so I read the first bit as default-removing them in setHTML (aka, comments: false) and second as default-allowing them in setHTML (aka, comments: true).

Nit: you want <!-- bla --> instead of /* bla */.

Oh, yes. I'll fix that... :)

@annevk
Copy link
Collaborator

annevk commented Jan 23, 2025

Oh yes. I keep inverting them. Hopefully that's just a me problem.

@mozfreddyb
Copy link
Collaborator

Ha, good catch. I was also thinking "comments" meaning "allow comments". Interesting that even ourselves can get confused 😬

@otherdaniel otherdaniel changed the title Explicit defaults for boolean SanitizerConfig items Defaults for boolean SanitizerConfig items Jan 30, 2025
@otherdaniel
Copy link
Collaborator Author

otherdaniel commented Jan 30, 2025

I change the PR to follows the discussion here. The default now depends on safe/unsafe usage.

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

(This PR made me think that maybe we should improve Infra so it's easier to retrieve a map member value and default to some other value if the member doesn't exist.)

index.bs Outdated Show resolved Hide resolved
Co-authored-by: Anne van Kesteren <[email protected]>
@mozfreddyb
Copy link
Collaborator

✅ 😉

@otherdaniel
Copy link
Collaborator Author

otherdaniel commented Jan 31, 2025

What does this do:

div.setHTMLUnsafe("<div data-foo='bar'>", {sanitizer:{dataAttributes:false}});

I had expected <div></div>, but when testing it against our in-progress implementation it returned <div data-foor="bar"></div>. The reason is that an empty attributes (and empty removeAttributes) list allows any attribute, and so the dataAttributes: key essentially becomes a no-op since any data attributes are already allowed. I think this is indeed what we have specced. Is this what we want? It feels odd to explicitly dis-allow data attributes, but then have them survive anyhow. :)

@annevk
Copy link
Collaborator

annevk commented Jan 31, 2025

I agree that's problematic. In general I wonder if this logic is correct. I would have expected us to first check the safelists if those are given. So when you iterate through the attributes:

  1. If there's an element safelist and it's not in there or the global attribute safelist: remove.
  2. If there's only a global attribute safelist and it's not in there: remove.
  3. If it starts with data- and dataAttributes is false: remove.
  4. Handle remove lists.

configuration["elements"]["removeAttributes"] also seems wrong. Shouldn't this lookup the actual element somewhere?

aarongable pushed a commit to chromium/chromium that referenced this pull request Feb 4, 2025
This tracks development of the spec:
WICG/sanitizer-api#254

The PR makes the default for "comments:" and "dataAttributes:" keys in
the configuration depend on whether this is for safe or unsafe use. That
requires a bit of plumbing, since now the logic to interpret a config
depends on a new flag. Also adds test cases.

Bug: 356601280
Change-Id: I076c5418006b0dc35babbffd7d991e04c0f1d522
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6189121
Commit-Queue: Daniel Vogelheim <[email protected]>
Reviewed-by: Yifan Luo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415510}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Feb 4, 2025
This tracks development of the spec:
WICG/sanitizer-api#254

The PR makes the default for "comments:" and "dataAttributes:" keys in
the configuration depend on whether this is for safe or unsafe use. That
requires a bit of plumbing, since now the logic to interpret a config
depends on a new flag. Also adds test cases.

Bug: 356601280
Change-Id: I076c5418006b0dc35babbffd7d991e04c0f1d522
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6189121
Commit-Queue: Daniel Vogelheim <[email protected]>
Reviewed-by: Yifan Luo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415510}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Feb 4, 2025
This tracks development of the spec:
WICG/sanitizer-api#254

The PR makes the default for "comments:" and "dataAttributes:" keys in
the configuration depend on whether this is for safe or unsafe use. That
requires a bit of plumbing, since now the logic to interpret a config
depends on a new flag. Also adds test cases.

Bug: 356601280
Change-Id: I076c5418006b0dc35babbffd7d991e04c0f1d522
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6189121
Commit-Queue: Daniel Vogelheim <[email protected]>
Reviewed-by: Yifan Luo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415510}
@otherdaniel
Copy link
Collaborator Author

Here's a WPT test that mirrors the proposals here, with allowance for inverting the booleans :). It also assumes that {sanitizer:{dataAttributes:false}} would dis-allow data-* attributes. I'd like to double-check this matches everyone's understanding.

https://github.com/web-platform-tests/wpt/blob/master/sanitizer-api/sanitizer-boolean-defaults.tentative.html

@evilpie
Copy link

evilpie commented Feb 4, 2025

Okay, maybe this doesn't make sense, but if we made comments a nullable boolean?, wouldn't that more accurately represent the three states of explicit true or false and "implicit based on safe"? If this doesn't make sense from a WebIDL perspective, just ignore me.

@bkardell
Copy link

bkardell commented Feb 4, 2025

I was also thinking "comments" meaning "allow comments". Interesting that even ourselves can get confused

Fwiw, I also thought that. Should the name be clearer? I like short names, but I slightly wonder if we are talking about something critical like - maybe its worth some extra characters? stripComments or whatever?

@otherdaniel
Copy link
Collaborator Author

Okay, maybe this doesn't make sense, but if we made comments a nullable boolean?, wouldn't that more accurately represent the three states of explicit true or false and "implicit based on safe"? If this doesn't make sense from a WebIDL perspective, just ignore me.

I think the dictionary already provides this explicit nullability, by not having the key present at all. I.e., {comments: true}, {comments: false}, or {}.

An observable difference would be something like, {sanitizer: {comments: undefined}}. I think WebIDL would convert that to {comments: false} (i.e. a fixed result), while a nullable boolean would preserve the null-ness (i.e., result depending on safe). That said, I find WebIDL difficult to grok, so maybe I'm misreading that.

@otherdaniel
Copy link
Collaborator Author

I was also thinking "comments" meaning "allow comments". Interesting that even ourselves can get confused

Fwiw, I also thought that. Should the name be clearer?

Per the current proposal(s), comments continues to mean "allow comments", similar to how elements and attributes are allow-lists, too. I hope the naming is then clear enough (and consistent within the API, too).

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Feb 7, 2025
…for comments and data-*., a=testonly

Automatic update from web-platform-tests
[Sanitizer API] Update default handling for comments and data-*.

This tracks development of the spec:
WICG/sanitizer-api#254

The PR makes the default for "comments:" and "dataAttributes:" keys in
the configuration depend on whether this is for safe or unsafe use. That
requires a bit of plumbing, since now the logic to interpret a config
depends on a new flag. Also adds test cases.

Bug: 356601280
Change-Id: I076c5418006b0dc35babbffd7d991e04c0f1d522
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6189121
Commit-Queue: Daniel Vogelheim <[email protected]>
Reviewed-by: Yifan Luo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1415510}

--

wpt-commits: 07920967d79b3c88d440ddede3f7f5dc3b81c573
wpt-pr: 50486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants