Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibilities KDL 2 <-> Query #489

Open
bgotink opened this issue Jan 19, 2025 · 2 comments
Open

Incompatibilities KDL 2 <-> Query #489

bgotink opened this issue Jan 19, 2025 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@bgotink
Copy link
Member

bgotink commented Jan 19, 2025

I'm trying to write a parser for KQL so I'm taking a look at the query spec, and I've noticed two issues:

lorem+ipsum is a valid identifier but + is an operator and the query spec doesn't require space around operators. Should this be interpreted as "look for a node named lorem+ipsum"? Or does this instead mean "look for a node named ipsum immediately following a node named lorem"?
Idem for (lorem)+[ipsum], does that look for a node named + with type (lorem) and a property ipsum? Or does this look for nodes with ipsum property directly following nodes with type (lorem)?

The grammar lists types of accessors that are not described anywhere else in the document: "values(" q-ws* ")" | "props(" q-ws* ")". How are these supposed to behave?

There's one other thing that purely comes down to confusion on my end, but with >, <, and || being valid identifiers, the following queries are all valid:

query description
>> >> >> node named >> that's a descendant of a node named >>
> > > node named > whose parent is also called >
|| || || A node named || or a node named ||
[val() <<] A node whose first value is a string with a value that comes before < when sorting alphabetically

In my opinion it would make sense to disallow identifiers that are also operators in the query spec, just like the KDL spec itself doesn't allow identifiers that could cause confusion with any of the keywords or with numbers.

@zkat
Copy link
Member

zkat commented Jan 19, 2025

uggghhh yes I thought I'd made them q-ws+ already. It's very much written with the intention that you can use whitespace to disambiguate.

And yes, as you say, >> >> >> is perfectly valid and I'm ok with that. Ditto > > > and || || ||. Corner cases, but unambiguous ones (this is why I made >> the descendant operator, instead of whitespace as in CSS).

Probably all the current q-ws* can be replaced with q-ws+ except for the ones in the following rules:

filter := "top(" q-ws* ")" | matchers
type-matcher := "(" q-ws* ")" | $type
accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]"

This one is particularly of note:

comparison := accessor q-ws* matcher-operator q-ws* ($type | $string | $number | $keyword)

You mentioned the [val()>>] case being weird, but this one's even worse: [foo>1], which is intended to mean [prop(foo) > 1], but is actually [prop("foo>1")] ("select any element with a prop called foo>1").

So while that means [val()=1] is no longer legal, it gets rid of that weird ambiguity. That said, it's kind of a footgun: I think for stuff like comparisons, a LOT of people will definitely assume they can omit spaces.

We're gonna have to seriously revisit this. I think it's safe to say that reserving those comparison characters isn't part of the possible design space at this point. But that means we can't do the equivalent of the css div[foo^=bar] and such. At least all the foo() accessors are "safe" on the left hand side. The right hand side continues to be a concern, though, for strings.

@zkat zkat added bug Something isn't working help wanted Extra attention is needed labels Jan 19, 2025
@bgotink
Copy link
Member Author

bgotink commented Jan 22, 2025

Based on my (very limited) experience writing & testing a KPath parser, I'd like to suggest a few changes:

  • As you already mentioned, make whitespace required around all operators. That solves all of the issues until we can find a better solution (if we can find a better solution), one that could allow [foo^=bar] to exist.
  • val() is mentioned a few times in the document but it isn't allowed per the grammar. Either the text or the grammar have to be updated.
  • There's a difference in terminology between KDL and KPath: what KDL calls "arguments" are called "values" in the KPath spec. As KDL is more stable, I suggest modifying KPath to use argument and arg(0)

Then there are a few gaps:

  • Most notably it's impossible to query a negative right now: [prop != lorem] checks whether property prop is present and not equal to "lorem". That's not the inverse of [prop = lorem] which checks whether the property prop is present and equal to "lorem".
    • In theory we could implement this inverse by changing the definition of != to also match properties/values that are not present, but that only solves the lack of a negative for the = operator. What about the opposite to [prop ^= prefix]? We could introduce !^= as operator but I personally think that's a bit much
    • My first thought is to borrow from CSS's syntax for case insensitive attribute selectors and go for [! prop = lorem]. The inverse of (type)node would then be [! type() = type] || [name() != node] (since a node name is always present, [! name() = node] is equivalent to [name() != node]).
  • CSS has the useful :is()/:where() pseudoselectors which allow grouping alternatives together. That would be useful to have, e.g. a >> {b || c} >> {d || e}
    • We could add a negative variant, e.g. a >> !{b || c} > d as well, like CSS's :not()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants