- Makes UMAP dimensional reduction more aggressive in removing discrete features before reduction.
- Adds a way to retrieve the version strings for a number of dependency packages.
- Refactors the assessment/recreation of derivative data payloads (e.g. binary feature matrices):
- Deprecates unnecessary logic, since now a single study must be specified.
- Assesses and recreates the different derivatives independently.
- Reduces the log burden due to computation worker processes, provides summaries instead.
- Adds more flexible triggering of job queue pops, on worker process start not just explicit signaling.
- Adds a robust timeout (default 5 minutes), after which any pending jobs will no longer have an effect when complete; incomplete computations are recorded as null and the queue is cleared of these items.
- Converts the database model to a single named database with one (PostgreSQL-sense) schema for each dataset.
- Uses the new model to implement cross-cut queries, specifically computation job count (load metric).
- Adds a TUI for dataset import to help prevent errors in selection of database credentials, data sources, and upload options.
- Includes comprehensive tutorial and reference documentation.
- Adds dataset "curation" or preprocessing details, with a complete example.
- A number of updates to the graph processing workflows.
- Improve UMAP functionality into an interactive plot.
- Fix some timing bugs in edge cases related to the feature value computation queue.
- Greatly improves handling of the Ripley statistic summaries.
- Adds support for S3 source files in Nextflow workflows that operate on these source files.
Implements a major refactoring of on-demand metrics computation in which each worker container picks up a single sample's worth of feature computation at a time. This is organized with a simple PostgresQL table considered as a task queue, and database notifications. Now all computations for a given sample begin from a database query for the same compressed binary payload representing phenotype and location data for all cells. The TCP client/server model for dispatching specific feature computations to different services is deprecated.
Implements a dataset collection concept using study name suffixes (tags/tokens/labels):
- The tabular import workflow uses value for key
Study collection
instudy.json
. - API endpoint
study-names
hides collection-tagged datasets by default. - Other API handlers unchanged, work as-is using the fully-qualified study names.
spt db collection ... --publish / --unpublish
provided to managed collection visibility.
Organize workflow configuration options into a workflow configuration file. This breaks the API for tabular import and similar.
Add support for small specimens (small cell set) in GNN workflow.
Add KDTree optimization to GNN ROI creation.
- Deprecates heavy index on large tables:
- Adds a new table for tracking scope ranges.
- Converts the former
source_specimen
column onexpression_quantification
to a `SERIAL`` integer. - Makes tabular import keep track of ranges per-specimen in the new range_definitions table.
- Updates the "optimized" sparse matrix query to use the ranges rather than the former huge index.
- Deprecates the modify-constraints CLI entrypoint (only used internally now).
- Deprecates the expression indexing module, CLI entrypoint, etc.
- Separates datasets into own databases:
DBCursor
andDBConnection
usage streamlined, typically requires study-scoping (dataset-scoping).- Deprecates
scstudies
database from database cluster. Replaced bydefault_study_lookup
and per-dataset databases. - Update test data artifacts which depended on all datasets being cohoused in the same database (e.g. things dealing with identifiers issue )
- Adds study-scoping throughout codebase where previously global identifers were assumed.
- Updates development DB image from postgres 14.5 to 16.0.
- Deprecates
initialize_schema.sql
that was previously used to feed the DB docker image initialization.
- Adds DGL and pytorch to the big development docker image (in which are run all the tests).
- Deprecate most occurrences of package-global namespace symbols, to reduce possible "leak" of unnecessary library imports for otherwise simple calls.
Deprecated nearest distance and density workflows.
Includes convenience whole-dataset pulling from the database.
- Deprecated front proximity workflow (for now).
- Large-scale linting of library code.
Separated build and test directories out of the source tree.