Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Optional for list items. #286

Open
1 task done
yarikoptic opened this issue Feb 7, 2025 · 6 comments
Open
1 task done

Remove Optional for list items. #286

yarikoptic opened this issue Feb 7, 2025 · 6 comments
Labels
blocked enhancement New feature or request

Comments

@yarikoptic
Copy link
Member

yarikoptic commented Feb 7, 2025

Stems from the UI observed issue/discussion between @candleindark @waxlamp and me

where we observe an ambiguity that schema allows for two semantically identical values of null (nothing) and [] (empty list) to be used.

We have about 30 unique attributes of such kind used in 34 definitions (overloads or whatever)
❯ grep 'Optional\[List' dandischema/models.py | sort | uniq -c | nl
     1	      1     affiliation: Optional[List[Affiliation]] = Field(
     2	      1     altName: Optional[List[Identifier]] = Field(
     3	      1     anatomy: Optional[List[Anatomy]] = Field(
     4	      2     approach: Optional[List[ApproachType]] = Field(
     5	      1     assayType: Optional[List[AssayType]] = Field(
     6	      1     contactPoint: Optional[List[ContactPoint]] = Field(
     7	      1     dataStandard: Optional[List[StandardsType]] = Field(
     8	      1     disorder: Optional[List[Disorder]] = Field(
     9	      1     dxdate: Optional[List[Union[date, datetime]]] = Field(
    10	      1     ethicsApproval: Optional[List[EthicsApproval]] = Field(
    11	      1     hasMember: Optional[List[Identifier]] = Field(
    12	      1     keywords: Optional[List[str]] = Field(
    13	      1     license: Optional[List[LicenseType]] = Field(
    14	      2     measurementTechnique: Optional[List[MeasurementTechniqueType]] = Field(
    15	      1     protocol: Optional[List[AnyHttpUrl]] = Field(
    16	      1     relatedParticipant: Optional[List[RelatedParticipant]] = Field(
    17	      1     relatedResource: Optional[List[Resource]] = Field(
    18	      1     roleName: Optional[List[RoleType]] = Field(
    19	      1     sameAs: Optional[List[AnyHttpUrl]] = Field(
    20	      2     sameAs: Optional[List[Identifier]] = Field(
    21	      1     species: Optional[List[SpeciesType]] = Field(
    22	      1     studyTarget: Optional[List[str]] = Field(
    23	      1     used: Optional[List[Equipment]] = Field(
    24	      1     variableMeasured: Optional[List[PropertyValue]] = Field(
    25	      1     variableMeasured: Optional[List[str]] = Field(
    26	      2     wasAttributedTo: Optional[List[Participant]] = Field(
    27	      1     wasDerivedFrom: Optional[List["BioSample"]] = Field(
    28	      1     wasDerivedFrom: Optional[List[BioSample]] = Field(
    29	      1     wasGeneratedBy: Optional[List["Session"]] = Field(
    30	      1     wasGeneratedBy: Optional[List[Union[Session, Project, Activity]]] = Field(
and all of them defaulting to `None`
❯ grep 'Optional\[List' -A 1 dandischema/models.py
    dxdate: Optional[List[Union[date, datetime]]] = Field(
        None,
--
    roleName: Optional[List[RoleType]] = Field(
        None,
--
    contactPoint: Optional[List[ContactPoint]] = Field(
        None,
--
    affiliation: Optional[List[Affiliation]] = Field(
        None,
--
    dataStandard: Optional[List[StandardsType]] = Field(
        None, json_schema_extra={"readOnly": True}
--
    approach: Optional[List[ApproachType]] = Field(
        None, json_schema_extra={"readOnly": True}
--
    measurementTechnique: Optional[List[MeasurementTechniqueType]] = Field(
        None, json_schema_extra={"readOnly": True, "nskey": "schema"}
--
    variableMeasured: Optional[List[str]] = Field(
        None, json_schema_extra={"readOnly": True, "nskey": "schema"}
--
    species: Optional[List[SpeciesType]] = Field(
        None, json_schema_extra={"readOnly": True}
--
    used: Optional[List[Equipment]] = Field(
        None,
--
    wasGeneratedBy: Optional[List["Session"]] = Field(
        None,
--
    altName: Optional[List[Identifier]] = Field(
        None, json_schema_extra={"nskey": "dandi"}
--
    disorder: Optional[List[Disorder]] = Field(
        None,
--
    relatedParticipant: Optional[List[RelatedParticipant]] = Field(
        None,
--
    sameAs: Optional[List[Identifier]] = Field(
        None,
--
    assayType: Optional[List[AssayType]] = Field(
        None,
--
    anatomy: Optional[List[Anatomy]] = Field(
        None,
--
    wasDerivedFrom: Optional[List["BioSample"]] = Field(
        None,
--
    wasAttributedTo: Optional[List[Participant]] = Field(
        None,
--
    sameAs: Optional[List[Identifier]] = Field(
        None, json_schema_extra={"nskey": "schema"}
--
    hasMember: Optional[List[Identifier]] = Field(
        None, json_schema_extra={"nskey": "prov"}
--
    studyTarget: Optional[List[str]] = Field(
        None,
--
    license: Optional[List[LicenseType]] = Field(
        None,
--
    protocol: Optional[List[AnyHttpUrl]] = Field(
        None,
--
    ethicsApproval: Optional[List[EthicsApproval]] = Field(
        None, title="Ethics approvals", json_schema_extra={"nskey": "dandi"}
--
    keywords: Optional[List[str]] = Field(
        None,
--
    relatedResource: Optional[List[Resource]] = Field(
        None, json_schema_extra={"nskey": "dandi"}
--
    sameAs: Optional[List[AnyHttpUrl]] = Field(
        None, json_schema_extra={"nskey": "schema"}
--
    approach: Optional[List[ApproachType]] = Field(
        None, json_schema_extra={"readOnly": True, "nskey": "dandi"}
--
    measurementTechnique: Optional[List[MeasurementTechniqueType]] = Field(
        None, json_schema_extra={"readOnly": True, "nskey": "schema"}
--
    variableMeasured: Optional[List[PropertyValue]] = Field(
        None, json_schema_extra={"readOnly": True, "nskey": "schema"}
--
    wasDerivedFrom: Optional[List[BioSample]] = Field(
        None, json_schema_extra={"nskey": "prov"}
--
    wasAttributedTo: Optional[List[Participant]] = Field(
        None,
--
    wasGeneratedBy: Optional[List[Union[Session, Project, Activity]]] = Field(
        None,

Why do not we drop Optional and just default to empty list there?

@satra wdyt?

  • @candleindark raised concern that linkml would anyways convert into pydantic model as Optional[List[]] when multivalue slot is used.
@yarikoptic yarikoptic added the enhancement New feature or request label Feb 7, 2025
@satra
Copy link
Member

satra commented Feb 7, 2025

when we export from pydantic model instances to json or dict there are two keywords who semantics vary a bit (exclude_none, exclude_unset). by switching to empty list as default we will have to use the latter.

this proposal is fine. just wanted to make a note.

related but same approach won't work: we also have an empty string issue on the ui side which may also be similar. i.e. an Optional[str] field that's filled and then made empty cannot be validated properly, when the string requires a pattern(e.g. url, email, etc.,.)

@candleindark
Copy link
Member

candleindark commented Feb 19, 2025

we also have an empty string issue on the ui side which may also be similar. i.e. an Optional[str] field that's filled and then made empty cannot be validated properly, when the string requires a pattern(e.g. url, email, etc.,.)

@satra Are you talking about validation at the Pydantic level or at the JSON Schema level? If you are talking about validation at the Pydantic level, we can solve this using a validator in before mode.

@candleindark
Copy link
Member

@yarikoptic Here is a demo of the translation of a model in LinkML with a multivalued slot.

personinfo.yaml
id: https://w3id.org/linkml/examples/personinfo
name: personinfo
prefixes:
  linkml: https://w3id.org/linkml/
  schema: http://schema.org/
  personinfo: https://w3id.org/linkml/examples/personinfo/
  ORCID: https://orcid.org/
imports:
  - linkml:types
default_range: string

classes:
  Person:
    class_uri: schema:Person
    slots: ## specified as a list
      - id
      - full_name
      - aliases
      - age

slots:
  id:
    identifier: true
  full_name:
    required: true
    description:
      name of the person
    slot_uri: schema:name
  aliases:  # `aliases` translated to have the type of `Optional[List[str]]` in Pydantic
    multivalued: true
    description:
      other names for the person
  age:
    range: integer
    description:
      age of the person

Run gen-pydantic personinfo.yaml > personinfo_pydantic.py to translate to a Pydantic representation.

personinfo_pydantic.py
from __future__ import annotations 
from datetime import (
    datetime,
    date
)
from decimal import Decimal 
from enum import Enum 
import re
import sys
from typing import (
    Any,
    ClassVar,
    List,
    Literal,
    Dict,
    Optional,
    Union
)
from pydantic import (
    BaseModel,
    ConfigDict,
    Field,
    RootModel,
    field_validator
)
metamodel_version = "None"
version = "None"


class ConfiguredBaseModel(BaseModel):
    model_config = ConfigDict(
        validate_assignment = True,
        validate_default = True,
        extra = "forbid",
        arbitrary_types_allowed = True,
        use_enum_values = True,
        strict = False,
    )
    pass




class LinkMLMeta(RootModel):
    root: Dict[str, Any] = {}
    model_config = ConfigDict(frozen=True)

    def __getattr__(self, key:str):
        return getattr(self.root, key)

    def __getitem__(self, key:str):
        return self.root[key]

    def __setitem__(self, key:str, value):
        self.root[key] = value

    def __contains__(self, key:str) -> bool:
        return key in self.root


linkml_meta = LinkMLMeta({'default_prefix': 'https://w3id.org/linkml/examples/personinfo/',
     'default_range': 'string',
     'id': 'https://w3id.org/linkml/examples/personinfo',
     'imports': ['linkml:types'],
     'name': 'personinfo',
     'prefixes': {'ORCID': {'prefix_prefix': 'ORCID',
                            'prefix_reference': 'https://orcid.org/'},
                  'linkml': {'prefix_prefix': 'linkml',
                             'prefix_reference': 'https://w3id.org/linkml/'},
                  'personinfo': {'prefix_prefix': 'personinfo',
                                 'prefix_reference': 'https://w3id.org/linkml/examples/personinfo/'},
                  'schema': {'prefix_prefix': 'schema',
                             'prefix_reference': 'http://schema.org/'}},
     'source_file': 'personinfo.yaml'} )


class Person(ConfiguredBaseModel):
    linkml_meta: ClassVar[LinkMLMeta] = LinkMLMeta({'class_uri': 'schema:Person',
         'from_schema': 'https://w3id.org/linkml/examples/personinfo'})

    id: str = Field(..., json_schema_extra = { "linkml_meta": {'alias': 'id', 'domain_of': ['Person']} })
    full_name: str = Field(..., description="""name of the person""", json_schema_extra = { "linkml_meta": {'alias': 'full_name', 'domain_of': ['Person'], 'slot_uri': 'schema:name'} })
    aliases: Optional[List[str]] = Field(None, description="""other names for the person""", json_schema_extra = { "linkml_meta": {'alias': 'aliases', 'domain_of': ['Person']} })
    age: Optional[int] = Field(None, description="""age of the person""", json_schema_extra = { "linkml_meta": {'alias': 'age', 'domain_of': ['Person']} })


# Model rebuild
# see https://pydantic-docs.helpmanual.io/usage/models/#rebuilding-a-model
Person.model_rebuild()

Run gen-python personinfo.yaml > personinfo.py to translate to a dataclass representation.

personinfo.py
# Auto generated from personinfo.yaml by pythongen.py version: 0.0.1
# Generation date: 2025-02-18T18:23:45
# Schema: personinfo
#
# id: https://w3id.org/linkml/examples/personinfo
# description:
# license: https://creativecommons.org/publicdomain/zero/1.0/

import dataclasses
import re
from jsonasobj2 import JsonObj, as_dict
from typing import Optional, List, Union, Dict, ClassVar, Any
from dataclasses import dataclass
from datetime import date, datetime
from linkml_runtime.linkml_model.meta import EnumDefinition, PermissibleValue, PvFormulaOptions

from linkml_runtime.utils.slot import Slot
from linkml_runtime.utils.metamodelcore import empty_list, empty_dict, bnode
from linkml_runtime.utils.yamlutils import YAMLRoot, extended_str, extended_float, extended_int
from linkml_runtime.utils.dataclass_extensions_376 import dataclasses_init_fn_with_kwargs
from linkml_runtime.utils.formatutils import camelcase, underscore, sfx
from linkml_runtime.utils.enumerations import EnumDefinitionImpl
from rdflib import Namespace, URIRef
from linkml_runtime.utils.curienamespace import CurieNamespace
from linkml_runtime.linkml_model.types import Integer, String

metamodel_version = "1.7.0"
version = None

# Overwrite dataclasses _init_fn to add **kwargs in __init__
dataclasses._init_fn = dataclasses_init_fn_with_kwargs

# Namespaces
ORCID = CurieNamespace('ORCID', 'https://orcid.org/')
LINKML = CurieNamespace('linkml', 'https://w3id.org/linkml/')
PERSONINFO = CurieNamespace('personinfo', 'https://w3id.org/linkml/examples/personinfo/')
SCHEMA = CurieNamespace('schema', 'http://schema.org/')
DEFAULT_ = PERSONINFO


# Types

# Class references
class PersonId(extended_str):
    pass


@dataclass(repr=False)
class Person(YAMLRoot):
    _inherited_slots: ClassVar[List[str]] = []

    class_class_uri: ClassVar[URIRef] = SCHEMA["Person"]
    class_class_curie: ClassVar[str] = "schema:Person"
    class_name: ClassVar[str] = "Person"
    class_model_uri: ClassVar[URIRef] = PERSONINFO.Person

    id: Union[str, PersonId] = None
    full_name: str = None
    aliases: Optional[Union[str, List[str]]] = empty_list()
    age: Optional[int] = None

    def __post_init__(self, *_: List[str], **kwargs: Dict[str, Any]):
        if self._is_empty(self.id):
            self.MissingRequiredField("id")
        if not isinstance(self.id, PersonId):
            self.id = PersonId(self.id)

        if self._is_empty(self.full_name):
            self.MissingRequiredField("full_name")
        if not isinstance(self.full_name, str):
            self.full_name = str(self.full_name)

        if not isinstance(self.aliases, list):
            self.aliases = [self.aliases] if self.aliases is not None else []
        self.aliases = [v if isinstance(v, str) else str(v) for v in self.aliases]

        if self.age is not None and not isinstance(self.age, int):
            self.age = int(self.age)

        super().__post_init__(**kwargs)


# Enumerations


# Slots
class slots:
    pass

slots.id = Slot(uri=PERSONINFO.id, name="id", curie=PERSONINFO.curie('id'),
                   model_uri=PERSONINFO.id, domain=None, range=URIRef)

slots.full_name = Slot(uri=SCHEMA.name, name="full_name", curie=SCHEMA.curie('name'),
                   model_uri=PERSONINFO.full_name, domain=None, range=str)

slots.aliases = Slot(uri=PERSONINFO.aliases, name="aliases", curie=PERSONINFO.curie('aliases'),
                   model_uri=PERSONINFO.aliases, domain=None, range=Optional[Union[str, List[str]]])

slots.age = Slot(uri=PERSONINFO.age, name="age", curie=PERSONINFO.curie('age'),
                   model_uri=PERSONINFO.age, domain=None, range=Optional[int])

As illustrated in the two translation, a slot that is multivalued and not required is translated to have a type of Optional[list[]]. In general, LinkML translates any slot that is not required to have an Optional type, though there is a difference in the default value in the two translations above.

@satra
Copy link
Member

satra commented Feb 19, 2025

Are you talking about validation at the Pydantic level or at the JSON Schema level?

at json schema inside vjsf

@yarikoptic
Copy link
Member Author

marking blocked as I think we should just see if newer vjsf could handle it properly and then could just close this.

@mvandenburgh
Copy link
Member

I tested this on the newer VJSF (VJSF 3). Here are my findings -

The custom schema generation function that @candleindark added seems to break the schema on VJSF 3, due to ajv not supporting a field with both type: "string" and default: null. i.e. this schema

        "affiliation": {
          "type": "string",
          "default": null,
          "description": "An organization that this person is affiliated with.",
          "nskey": "schema",
          "title": "Affiliation"
        }

Removing that function and regenerating the schema yields something that looks like this

        "affiliation": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/Affiliation"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "An organization that this person is affiliated with.",
          "nskey": "schema",
          "title": "Affiliation"
        }

And while this doesn't cause ajv to raise any errors, it does cause these fields to not get rendered in the meditor. This seems to be a limitation of VJSF where it doesn't handle nullable arrays as we expected. I've opened an issue on that repo for this koumoul-dev/vuetify-jsonschema-form#479.

Finally, I tested transforming all these Optional[List[...]] = Field(None, ...) into List[...] = Field([]), which yields a schema like this

        "affiliation": {
          "items": {
            "$ref": "#/$defs/Affiliation"
          },
          "type": "array"
          "default": [],
          "description": "An organization that this person is affiliated with.",
          "nskey": "schema",
          "title": "Affiliation"
        }

And this works fine in VJSF 3.

So, from my perspective concerning meditor/web app development, it seems removing Optional and setting these fields to default to [] instead of None would be preferred. I'm not sure what the implications of that are for dandischema or other parts of the system though. I think it's also worth waiting to see what the VJSF maintainer says about the nullable array issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants