Skip to content

Latest commit

 

History

History
85 lines (66 loc) · 4.65 KB

CHANGELOG.md

File metadata and controls

85 lines (66 loc) · 4.65 KB

v1.0.6

  • Makes UMAP dimensional reduction more aggressive in removing discrete features before reduction.

v1.0.5

  • Adds a way to retrieve the version strings for a number of dependency packages.

v1.0.4

  • Refactors the assessment/recreation of derivative data payloads (e.g. binary feature matrices):
    • Deprecates unnecessary logic, since now a single study must be specified.
    • Assesses and recreates the different derivatives independently.

v1.0.3

  • Reduces the log burden due to computation worker processes, provides summaries instead.
  • Adds more flexible triggering of job queue pops, on worker process start not just explicit signaling.
  • Adds a robust timeout (default 5 minutes), after which any pending jobs will no longer have an effect when complete; incomplete computations are recorded as null and the queue is cleared of these items.

v1.0.2

  • Converts the database model to a single named database with one (PostgreSQL-sense) schema for each dataset.
  • Uses the new model to implement cross-cut queries, specifically computation job count (load metric).
  • Adds a TUI for dataset import to help prevent errors in selection of database credentials, data sources, and upload options.

v1.0.0

  • Includes comprehensive tutorial and reference documentation.
  • Adds dataset "curation" or preprocessing details, with a complete example.
  • A number of updates to the graph processing workflows.
  • Improve UMAP functionality into an interactive plot.
  • Fix some timing bugs in edge cases related to the feature value computation queue.
  • Greatly improves handling of the Ripley statistic summaries.
  • Adds support for S3 source files in Nextflow workflows that operate on these source files.

v0.24.0

Implements a major refactoring of on-demand metrics computation in which each worker container picks up a single sample's worth of feature computation at a time. This is organized with a simple PostgresQL table considered as a task queue, and database notifications. Now all computations for a given sample begin from a database query for the same compressed binary payload representing phenotype and location data for all cells. The TCP client/server model for dispatching specific feature computations to different services is deprecated.

v0.23.1

Implements a dataset collection concept using study name suffixes (tags/tokens/labels):

  • The tabular import workflow uses value for key Study collection in study.json.
  • API endpoint study-names hides collection-tagged datasets by default.
  • Other API handlers unchanged, work as-is using the fully-qualified study names.
  • spt db collection ... --publish / --unpublish provided to managed collection visibility.

v0.17.5

Organize workflow configuration options into a workflow configuration file. This breaks the API for tabular import and similar.

v0.17.2

Add support for small specimens (small cell set) in GNN workflow.

v0.17.1

Add KDTree optimization to GNN ROI creation.

v0.16.2

  • Deprecates heavy index on large tables:
    • Adds a new table for tracking scope ranges.
    • Converts the former source_specimen column on expression_quantification to a `SERIAL`` integer.
    • Makes tabular import keep track of ranges per-specimen in the new range_definitions table.
    • Updates the "optimized" sparse matrix query to use the ranges rather than the former huge index.
    • Deprecates the modify-constraints CLI entrypoint (only used internally now).
    • Deprecates the expression indexing module, CLI entrypoint, etc.

v0.16.0

  • Separates datasets into own databases:
    • DBCursor and DBConnection usage streamlined, typically requires study-scoping (dataset-scoping).
    • Deprecates scstudies database from database cluster. Replaced by default_study_lookup and per-dataset databases.
    • Update test data artifacts which depended on all datasets being cohoused in the same database (e.g. things dealing with identifiers issue )
    • Adds study-scoping throughout codebase where previously global identifers were assumed.
    • Updates development DB image from postgres 14.5 to 16.0.
    • Deprecates initialize_schema.sql that was previously used to feed the DB docker image initialization.
  • Adds DGL and pytorch to the big development docker image (in which are run all the tests).
  • Deprecate most occurrences of package-global namespace symbols, to reduce possible "leak" of unnecessary library imports for otherwise simple calls.

v0.12.4

Deprecated nearest distance and density workflows.

v0.12.3

Includes convenience whole-dataset pulling from the database.

v0.12.2

  • Deprecated front proximity workflow (for now).
  • Large-scale linting of library code.

v0.12.1

Separated build and test directories out of the source tree.