Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace "IRI" in spec language? #75

Open
dhh1128 opened this issue Jun 24, 2020 · 9 comments
Open

replace "IRI" in spec language? #75

dhh1128 opened this issue Jun 24, 2020 · 9 comments

Comments

@dhh1128
Copy link

dhh1128 commented Jun 24, 2020

Per a suggestion in here, I wanted to suggest an update to the RDF spec language.

The heavy use of "IRI" as terminology in the RDF spec, referencing RFC 3987, raises a number of thorny issues and actually makes RDF out-of-sync with the latest developments at W3C. See https://www.w3.org/International/wiki/IRIStatus. (Also, the status of RFC 3987 has never moved past PROPOSED.)

If the spec continues to use the term, then there should probably be a section added to the spec to explain how RDF proposes to solve the IRI problems such as inconsistent use of punycode and percent encoding. I suspect it would be simpler to use "URL" everywhere, per W3C recommendations, with a simple (foot)note explaining that the intent of "URL" is to encompass internationalization as originally envisioned by the IRI effort, but will track the work of the new URL working group for the particulars.

@afs
Copy link
Contributor

afs commented Jul 2, 2020

What data is there from the field about problems that actually occur?

LIke any standards in usage at scale, some things are less than perfect. It is the practical impact that matters.

https://www.w3.org/International/wiki/IRIStatus raises issues about encoding of Internationalized Domain Names and presentation of Bidirectional Language.

The grammar for IRIs in RFC3987 over Unicode codepoint is solid.

RDF 1.1 uses IRIs but it doesn't define them, nor create them, encode or decode them. To some extent, it's garbage-in-garbage-out like any other data. The unicode string must conform to RFC3987 grammar and what it refers is consistent between the creator and any app receiving the data.

RDF 1.0, which uses the problematic-but-necessary-at-the-time "RDF URI References", is fortunately in the past.

The %XX issues are not specific to IRIs - it applies to URIs as well. %7E for example. RFC 3986 section 2.3 says "don't do that" and add that normalization should put the real character in. There is discussion in RDF spec section 3.2.

Putting in text to explain about IRIs, adding to section 3.2 guided by field experience to focus on what comes up for real seems to me the way to go.

@jimkont
Copy link
Member

jimkont commented Jul 3, 2020

Agree with @afs. Another thing that might worth mentioning on section 3.2 is that IRIs might not work well with dereferencing, see section 6.1 in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3198968 for details

@asbjornu
Copy link

asbjornu commented Sep 26, 2020

What data is there from the field about problems that actually occur?

I have a problem with the actual word IRI, which is understood by exactly none of the developers I've spoken with about the term over my carreer. Just an anecdote, but I'm pretty confident I'm not alone with that experience. The term IRI itself is not a huge barrier to entry, but given how impenetrable RDF already is, every micron we can chip off that barrier is going to help.

It's also worth noting the repercussions the IRI usage in RDF has on the RDF ecosystem, where JSON-LD is unable to remove the term (w3c/json-ld-syntax#355) since it's just a serialization of RDF and thus not within its scope or power to remove.

As an answer for why "IRI" should be replaced with "URL", I think the answer can be found in this very repository's README:

3. Backward compatibility is highly desirable, but less important than ease of use.

@aucampia
Copy link

aucampia commented May 22, 2022

Some concerns with using "URL" over "IRI" is that in most cases identifiers should not be URLs at all, but rather other types of URIs, such as URNs, or tag URIs, or example URIs, or even sometimes file URIs. The overuse of URLs when URNs would work well enough is not very helpful.

Also I'm unsure what process resulted in https://www.w3.org/International/wiki/IRIStatus, it seems like it was mostly written by one person.

@chiarcos
Copy link

chiarcos commented May 23, 2022 via email

@pchampin
Copy link

@aucampia

Some concerns with using "URL" over "IRI" is that in most cases identifiers should not be URLs at all, but rather other types of URIs, such as URNs, or tag URIs (...)

I believe that the advocates of replacing IRI with URL are considering https://url.spec.whatwg.org/ as the new reference for URL. And in that spec, the notion of URL encompases all IANA URI schemes:

A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.). Schemes should be registered in the IANA URI [sic] Schemes registry. [IANA-URI-SCHEMES] [RFC7595]

quoted from https://url.spec.whatwg.org/#url-writing

@aucampia
Copy link

I believe that the advocates of replacing IRI with URL are considering https://url.spec.whatwg.org/ as the new reference for URL. And in that spec, the notion of URL encompases all IANA URI schemes

You are indeed correct, I'm quite ambivalent on the effort there, seems like re-purposing URL to mean something different from what it meant in a previously ratified internet standard won't do much to alleviate confusion, just add to it. And also that spec has yet to publish a grammar and it is also very heavily geared towards browsers. I think whatever problems existed before is somewhat being compounded.

I'm also really unsure to what extent the complexity introduced by the concept of a URI or even a IRI is actually making things difficult.

@aucampia
Copy link

aucampia commented May 23, 2022

I would say at the very least, for https://url.spec.whatwg.org/ to be a considered a candidate, it should have a grammar and not just a parsing algorithm written in English.

@afs
Copy link
Contributor

afs commented May 23, 2022

URIs and original URLs are ASCII to go in the HTTP request request-uri. (UTF-8 often works anyway nowadays and also it is hidden by browsers.) It does lead to confusion as to whether e.g. %7E is really some character (~) or is really %-7-E (encoding != escaping).

RFC 3986 refers to ALPHA and RFC 2234 (ABNF) defines : ALPHA as A-Z a-z.

RDF 1.0 has "RDF URI References" which anticipated IRIs. Except IRIs ended up slightly differently. RDF URI reference allow spaces.

Adding a new term for clarity in any revised RDF makes a lot of sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants