Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify abstract data model to be more concrete #855

Open
msporny opened this issue Jul 1, 2024 · 8 comments
Open

Simplify abstract data model to be more concrete #855

msporny opened this issue Jul 1, 2024 · 8 comments
Labels
class 2 Changes that do not functionally affect interpretation of the document discuss Needs further discussion before a pull request can be created

Comments

@msporny
Copy link
Member

msporny commented Jul 1, 2024

It has been suggested that the abstract data model in DID Core creates unnecessary complexity and that a more concrete data model should be selected, based on implementation experience over the past two years. This issue is to track the discussion of how that simplification might occur.

@msporny msporny added the class 2 Changes that do not functionally affect interpretation of the document label Jul 1, 2024
@decentralgabe
Copy link
Contributor

chair hat off
I am in favor of a concrete data model but I am interested in maintaining compatibility among DID methods that do not currently make use of JSON-LD for extensibility.

DID DHT does not use JSON-LD for extensibility for a few reasons:

  • To save space, since we must stay within 1000 bytes.
  • To make sure all terms are defined, and all terms have DNS-record mappings ahead of time

I believe DID DHT could potentially be adjusted to add processing rules to transform the document to one with a context, and register LD term definitions alongside registered properties. That said, it would be a breaking change.

I am curious how other DID Methods leverage the abstract data model, and it would be good to get a sense of the variety of implementations out there before seeing if it's feasible to define a concrete representation.

Separately, I am not sure this type of change is permitted, as it might fall under the Class 4 definition:

Changes that add new functionality, such as new elements, new APIs, new rules, etc.

Since I believe this could be considered a "new feature" by introducing new rules for representing DID Documents.

@peacekeeper
Copy link
Contributor

I generally agree with the direction of simplifying the specification by removing the abstract data model and replacing it with a concrete one (which can then be converted to different representations like YaML, CBOR, etc.)

@msporny
Copy link
Member Author

msporny commented Aug 1, 2024

I also agree that it is possible to remove the abstract data model in a way that does not affect existing implementation conformance and that we should make an attempt at doing this. To provide a concrete proposal, this would entail:

  1. Noting that the core data model and serialization for DID Core is JSON-LD.
  2. Enable the ability to NOT specify context by noting that DID Method specifications MAY provide rules on the proper way to inject a context for their DID Method if one is missing and JSON-LD processing is desired (this enables did:dht to keep doing its thing, the only modification would be on "how to inject a context if one is desired").
  3. Note that any other serialization is allowed as long as it can be losslessly converted to/from the base data model.

To be clear, if any of the steps above would result in a conforming DID Method becoming non-conformant, we'd clearly have to figure out how to fix the spec text so that doesn't happen. The goal here is to simplify the specification while not invalidating any currently conforming DID Methods.

@msporny
Copy link
Member Author

msporny commented Aug 1, 2024

@decentralgabe wrote:

To save space, since we must stay within 1000 bytes.

Hmm, the DID Core URL is 28 characters, a did:dht one would be maybe twice to three times that? Trading 75 characters for no deterministic way to do extensibility doesn't seem like a good trade off to me.

To make sure all terms are defined, and all terms have DNS-record mappings ahead of time

I don't understand these statements?

IOW, the approach ensures that NO terms are defined (except for maybe in did:dht, and who knows if those definitions are going to conflict with definitions in other DID Methods). It feels like a recipe for guaranteed term conflicts in the future. I also don't understand "all terms have DNS-record mappings ahead of time" -- what does that mean?

To be clear, I think did:dht could continue to do what its doing post-change to remove abstract data model, but am interested in learning more from the above.

@OR13
Copy link
Contributor

OR13 commented Aug 1, 2024

I fully support a normative requirement on JSON-LD only core data model, and to eliminate the JSON and abstract data models from the next version of the technical recommendation.

We've seen substantial confusion caused by this, and there is needless complexity and interoperability problems created by having an abstract data model, that is for the most part, just RDF... sometimes broken RDF.

I think the W3C VCWG did the right thing, by clarifying that W3C VCs are always JSON-LD, and allowing alternative serialization of digital credentials such as ISO mDoc, OAUTH SD-JWTs, attribute certs and other formats to be developed elsewhere.

I would recommend that the DID WG take a similar approach.

Do JSON-LD based DIDS as well as they can be done at W3C.

Do not attempt to define multiple serializations of the data model.

Provide concrete resolution guidance based on the JSON-LD ecosystem, such as document loaders, which can handle either URNs or URLs, and which are already supported well in JSON-LD tooling.

Address the @vocab issue up front, decide if the core data model is really RDF, and make sure that JSON-LD productions can always be converted to RDF with normative text, if that is a desired property of the core data model.

If people want to do "did like things" in CBOR or YAML, let them do that... but make it clear that DIDs are JSON-LD, just like its now clear that W3C VCs are JSON-LD.

@decentralgabe
Copy link
Contributor

@msporny it gets into the specifics of how did:dht works and there is more detail here but the short version is as a size saving mechanism the spec leverages a DID Document -> DNS Packet mapping, and then using DNS packet compression the result is saved on the DHT. We did an analysis of a number of compression formats (plain bytes, json, cbor, a custom binary serialization, and DNS) and found that DNS balanced an efficiency/already existing software tradeoff.

Without a known mapping (or reverse mapping) between a property in the DID Doc and packet representation we cannot effectively store the record on the DHT, so these must be registered in the spec or a well known registry to reduce inconsistencies across implementations. The spec itself has a registry for this purpose.

Leveraging the existing DID registry is likely the best process--noting properties supported by did:dht linked to their DID registry reference.

This is the approach we've taken so far, but are open to other alternatives while maintaining the goal of saving as many bytes as possible.

@msporny
Copy link
Member Author

msporny commented Aug 3, 2024

@msporny it gets into the specifics of how did:dht works and there is more detail here but the short version is as a size saving mechanism the spec leverages a DID Document -> DNS Packet mapping, and then using DNS packet compression the result is saved on the DHT. We did an analysis of a number of compression formats (plain bytes, json, cbor, a custom binary serialization, and DNS) and found that DNS balanced an efficiency/already existing software tradeoff.

Ah, I see. I skimmed those sections and haven't tried to put the whole problem in my head to think about it more deeply. My gut reaction is that the "custom Domain-Specific Language for DNS encoding of DID Documents" thing feels a bit fraught, but that's a completely orthogonal issue.

Based on what I saw in the spec, however, it feels like it would be fairly trivial for the DID Resolution process for did:dht to have a step in there where you inject a context value to continue to be conformant with whatever change we make to remove the abstract data model while not requiring those details to be encoded in the DNS Packet mapping. For example, you could key off of the v=M value to figure out what context to inject.

This is the approach we've taken so far, but are open to other alternatives while maintaining the goal of saving as many bytes as possible.

I would imagine that CBOR-LD applied to a did:dht document would result considerably better compression (but understand that the community developing did:dht probably has no desire to go that route). It also seems like you're double-base encoding values if you're using JWK? I don't remember seeing this "DNS" section when I last read about did:dht... I had thought did:dht was a pure mainline DHT implementation w/ no requirement for DNS records. The section about the DNS records doesn't explain why it's used and/or necessary (but I did skim the document, so I probably missed the justification for the DSL).

In any case, with respect to changes to the abstract data model, I would expect that there wouldn't be an issue for did:dht as it exists today. All you would need to do is add some text to the spec to inject a context when resolving and that will cost you zero bytes to do in the storage format in DNS.

@iherman
Copy link
Member

iherman commented Aug 4, 2024

Just to make it clear: this comment is with my W3C staff member's hat put down.

TL;DR: my preference is to keep the abstract data model (ADM) as is.

I have several reasons:

  • The problems why we decided to introduce the ADM are still valid: the need for communities to create DID documents that are not in JSON-LD. The creation of the ADM was the result of long, and sometimes acrimonious, discussions. If we roll back on it, we may reopen wounds, and we may open the door for formal objections both within and outside the group. This WG has already had more than its share of formal objections😒, I do not think we need new ones…

  • The proposed alternative is, essentially, to do what was done in the VC WG. That was not a walk in the park either; it alienated a lot of people in that WG who decided to walk away. I would prefer not to see the same happening in this WG.

  • I remember that changing the DID Core specification to introduce the VCDM was a major editorial work. Rolling that back would probably require a major document surgery again. This WG has a major specification/editorial work ahead for the DID Resolution, as well rethinking the Registry. Also, let us not forget that we have a major personal overlap with the VC WG (including in terms of document editors), and that WG still has a busy few months ahead. All this makes the manpower issue for this Working Group challenging. Let us spend our resources wisely; I do not think this work is the best usage of our time and energy.

  • It is a debatable issue whether we are allowed to do such a change in the first place. At the moment, the issue is labeled as class 2 change, but I am not sure that I agree. First of all, as @decentralgabe put it in Simplify abstract data model to be more concrete #855 (comment), we probably do not know how methods out there made use of the ADM; any change of it might lead to breaking change for them (which means class 3). However, while I realize that the definition of class 4 changes (that we are not allowed to do) in the W3C Process Document is fairly terse, and that we may get by claiming that we do an editorial change only, I think that such a change might be o.k. by the letter of the “law”, but not by its spirit. It is a major conceptual change of the specification.

    Bottom line: If we do such a change we may be facing a series of disagreeable questions by the W3C Management as well as the AC members.

  • My last comment/question may be the most important one: Why? What do we want to achieve? Is it really worth getting into possible arguments with the community, the AC, W3C Management, etc.? The only argument I saw was “simplification”. First of all, it may not be all that bad if we have 100+ methods registered which all claim, I presume, to be conform to the model. Also, the discussions in the VC group have shown that putting JSON-LD in the center does not make it simple for those who are not experts in JSON-LD, so we may end up flipping one type of complication for another. I.e, I cannot really buy that “simplification” argument. Let alone the fact that if we accept to refer to the VC Controller Document (see Normatively reference Controller Document #854), many things will be taken out of the DID Core specification which, by itself, will make the specification much simpler without any controversy…

@decentralgabe decentralgabe added the discuss Needs further discussion before a pull request can be created label Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
class 2 Changes that do not functionally affect interpretation of the document discuss Needs further discussion before a pull request can be created
Projects
None yet
Development

No branches or pull requests

5 participants