Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(expr): Make intersects() return the intersection when non-empty #150

Merged
merged 2 commits into from
Feb 6, 2025

Conversation

effigies
Copy link
Contributor

@effigies effigies commented Feb 4, 2025

This updates the intersects() function in the schema expression language to:

  1. Return the intersection, if non-empty. The order will match the order in the longer list.
  2. Return null on empty intersections to ensure that intersects() remains falsey.

This seems like the simplest resolution for resolving bids-standard/bids-specification#914 along the lines of:

If participants.tsv exists, then the participant_label entries are a superset of subject directories and participant labels found in phenotype/.

With this change, the rule could be written:

SubDirSubset:
  selectors:
    - path == '/participants.tsv'
  checks:
    - |
      allequal(
        sorted(intersects(dataset.subjects.participant_id, dataset.subjects.sub_dirs)),
        sorted(dataset.subjects.sub_dirs)
      )

PhenotypeSubset:
  selectors:
    - path == '/participants.tsv'
    - dataset.subjects.phenotype
  checks:
    - |
      allequal(
        sorted(intersects(dataset.subjects.participant_id, dataset.subjects.phenotype)),
        sorted(dataset.subjects.phenotype)
      )

If we require that validators sort the arrays in dataset.subjects on load, then we can drop the extra sorted() calls here.

Because we've seen datasets with subjects in the thousands, I've updated this function to create a Set() based on the smaller list. There is very likely a small n where this constitutes a performance regression.

@effigies
Copy link
Contributor Author

effigies commented Feb 4, 2025

cc @ericearl

@effigies effigies force-pushed the feat/intersects branch 2 times, most recently from 59b5921 to 44db152 Compare February 4, 2025 19:34
@effigies
Copy link
Contributor Author

effigies commented Feb 6, 2025

@effigies effigies merged commit 3a66a21 into bids-standard:main Feb 6, 2025
17 checks passed
@effigies effigies deleted the feat/intersects branch February 6, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants