Releases: nextstrain/augur
Releases · nextstrain/augur
24.2.3
These release notes are automatically extracted from the full changelog.
Bug Fixes
- filter: Updated the help and report text of
--min-length
to explicitly state that the minimum length filter only counts standard nucleotide characters A, C, G, or T (case-insensitive). This has been the behavior since version 3.0.3.dev1, but has never been explicitly documented. #1422 (@joverlee521) - frequencies: Fixed a bug introduced in 24.2.0 and 24.1.0 that prevented
--regions
from working when providing regions other than the default "global" region. #1424
24.2.2
These release notes are automatically extracted from the full changelog.
Bug Fixes
- filter: In versions 24.2.0 and 24.2.1,
--query
stopped working in cases where internal optimizations added in version 24.2.0 failed to parse the columns from the query. It now falls back to non-optimized behavior that allows queries to work. #1418 (@victorlin) - filter: Handle backtick quoting in internal optimizations of
--query
. #1417 (@victorlin)
24.2.1
These release notes are automatically extracted from the full changelog.
Bug Fixes
- frequencies: Fixed a bug introduced in 24.2.0 that prevented
--method diffusion
from working alongside--tree
. #1412 (@victorlin)
24.2.0
These release notes are automatically extracted from the full changelog.
Features
- filter: Added a new option
--query-columns
that allows specifying what columns are used in--query
along with the expected data types. If unspecified, automatic detection of columns and types is attempted. #1294 (@victorlin) augur.io.read_metadata
: A new optionalcolumns
argument allows specifying a subset of columns to load. The default behavior still loads all columns, so this is not a breaking change. #1294 (@victorlin)augur parse
: A new optional--output-id-field
argument allows the user to select any ID field for the produced FASTA file (e.g. 'accession' instead of 'name' or 'strain'). #1403 (@j23414)- When no
--output-id-field
is given and the data has bothname
andstrain
fields, continue to preferentially usename
overstrain
as the sequence ID field; but, throw a deprecation warning that the order will be switched to preferstrain
overname
in the future to be consistent with the rest of Augur. - Added entry to DEPRECATED.md.
- When no
- Compression should now be supported for all input and output files. Please open an issue if you find one that doesn't! #1381 (@victorlin)
Bug Fixes
- filter: In version 24.1.0, automatic conversion of boolean columns was accidentally removed. It has been restored with additional support for empty values evaluated as
None
. #1410 (@victorlin) - filter: The order of rows in
--output-metadata
and--output-strains
now reflects the order in the original--metadata
. #1294 (@victorlin) - filter, frequencies, refine: Performance improvements to reading the input metadata file. #1294 (@victorlin)
- For filter, this comes with increased writing times for
--output-metadata
and--output-strains
. However, net I/O speed still decreased during testing of this change.
- For filter, this comes with increased writing times for
- filter: Updated the help text of
--include
and--include-where
to explicitly state that this can add strains that are missing an entry from--sequences
. #1389 (@victorlin) - filter: Fixed the summary messages to properly reflect force-inclusion of strains that are missing an entry from
--sequences
. #1389 (@victorlin) - filter: Updated wording of summary messages. #1389 (@victorlin)
- Enforce UTF-8 encoding when reading and writing files. Improve error messages when a non-UTF-8 file is used. #1381 (@victorlin)
24.1.0
These release notes are automatically extracted from the full changelog.
Features
augur.io.read_metadata
: A new optionaldtype
argument allows custom data types for all columns. Automatic type inference still happens by default, so this is not a breaking change. #1252 (@victorlin)augur.io.read_vcf
has been removed and usage replaced with TreeTime's function of the same name which has improved validation of the VCF file. #1366 (@jameshadfield)
Bug Fixes
- filter, frequencies, refine: Speed up reading of the metadata file. #1252 (@victorlin)
- traits: Previously, columns with only numeric values were treated as numerical data. These are now treated as categorical data for discrete trait analysis. #1252 (@victorlin)
- Support Biopython
≥1.82
by requiring bcbio-gff≥0.7.1
. #1400 (@victorlin)
24.0.0
These release notes are automatically extracted from the full changelog.
Major Changes
- ancestral, translate: For VCF inputs please ensure you are using TreeTime 0.11.2 or later. A large number of bugfixes and improvements have been added in both Augur and TreeTime. #1355 and TreeTime #263 (@jameshadfield)
- ancestral, translate: GenBank files now require the (GFF mandatory) source feature to be present. #1351 (@jameshadfield)
- ancestral, translate: For GFF files, we extract the genome/sequence coordinates by inspecting the sequence-region pragma, region type and/or source type. This information is now required. #1351 (@jameshadfield)
Features
- ancestral, translate: Improvements to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
- Output VCF will better match the input VCF, including CHROM name and ploidy encoding.
- VCF inputs now require
--vcf-reference-output
- AA sequences are now exported for the tree root
- VCF writing is now 3 orders of magnitude faster (dataset dependent)
- ancestral, translate: A range of improvements to how we parse GFF and GenBank reference files. #1351 (@jameshadfield)
- translate will now always export a 'nuc' annotation in the output JSON, allowing it to pass validation
- Gene/CDS names of 'nuc' are now forbidden.
- If a Gene/CDS in the GFF/GenBank file is unparsed we now print a warning.
- ancestral: For VCF alignments, a VCF output file is now only created when requested via
--output-vcf
. #1344 (@jameshadfield) - ancestral: Improvements to command line arguments. #1344 (@jameshadfield)
- Incompatible arguments are now checked, especially related to VCF vs FASTA inputs.
--vcf-reference
and--root-sequence
are now mutually exclusive.
- translate: Tree nodes are checked against the node-data JSON input to ensure sequences are present. #1348 (@jameshadfield)
- utils::load_features: This function may now raise
AugurError
. #1351 (@jameshadfield) - export v2: Automatically minify large outputs. Use
--no-minify-json
to disable this default behavior. #1352 (@victorlin) - Added a new file DEPRECATED.md to document timelines and progress of deprecated features in the Augur CLI and Python API. #1371 (@victorlin)
Bug Fixes
- ancestral, translate: Various fixes to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
- Fix incorrect (but passing) tests
- Fix case-sensitive sequence comparisons between the root and reference sequences.
- Fix a bug where ambiguous alleles are not inferred (see #1380 for full details).
- Fix a bug where positions with no sequence information were assigned a base because the mask was not being computed (see #1382 for full details).
- More than one ALT allele is now correctly parsed
- Mutations followed by an insertion are now parsed
- Unchanged ref genotypes are now encoded as '0' rather than '.'
- ALT alleles "*" are now valid (introduced in VCF spec 4.2, but observed in VCF 4.1 files)
- Positions with no variation are no longer exported
- ancestral, translate: Fixes for JSON (non-VCF) inputs. #1355 (@jameshadfield)
- The "reference" translations are now from the provided reference sequence, not from the root of the tree. #1355 (@jameshadfield)
- Fix a bug where positions with no sequence information were assigned a base because the mask was not applied (see #1382 for full details)
- ancestral, translate: Avoid incompatibilities with Biopython >=1.82. #1374, #1387 (@victorlin)
- ancestral, translate: Address Biopython deprecation warnings. #1379 (@victorlin)
- ancestral: Previously, the help text for
--genes
falsely claimed that it could accept a file. Now, it can truly claim that. #1353 (@victorlin) - translate: The 'source' ID for GFF files is now ignored as a potential gene feature (it is still used for overall nuc coords). #1348 (@jameshadfield)
- translate: Improvements to command line arguments. #1348 (@jameshadfield)
--tree
and--ancestral-sequences
are now required arguments.- separate VCF-only arguments into their own group
- translate: Fixes a bug in the parsing behaviour of GFF files whereby the presence of the
--genes
command line argument would change how we read individual GFF lines. Issue #1349, PR #1351 (@jameshadfield) - If
TreeTimeError
is encountered Augur now exits with code 2 rather than 0. (This restores the original behaviour.) #1367 (@jameshadfield) - Deprecate
read_strains
fromaugur.utils
and add it to the public API underaugur.io
. #1353 (@victorlin)
23.1.1
These release notes are automatically extracted from the full changelog.
Bug Fixes
- Fix Python 3.11 installation for Conda environments. #1334 (@victorlin)
- Bump
pyfastx
dependency to major versions 1 and 2. #1335 (@victorlin)
23.1.0
These release notes are automatically extracted from the full changelog.
Features
- Support treetime 0.11.* #1310 (@corneliusroemer)
- export: Allow minimal export using only a (newick) tree in
augur export v2
. #1299 (@jameshadfield) - A number of schema updates and improvements #1299 (@jameshadfield)
- We now require all nodes to have
node_attrs
on them with one ofdiv
ornum_date
present - Some never-used properties are removed from the schemas, including a pattern for defining nucleotide INDELs which was never used by augur or auspice.
- Tip label defaults are now settable within the auspice-config JSON
- Empty colorings definitions are allowed (the tree will be grey in Auspice)
- We now require all nodes to have
Bug fixes
23.0.0
These release notes are automatically extracted from the full changelog.
Major Changes
- Drop support for Python 3.7. #1296 (@victorlin)
Features
- export v2: Allow the root-sequence data to be included (inlined) in the main dataset JSON file, avoiding the need for a sidecar
_root-sequence.json
file. #1295 (@jameshadfield)
22.4.0
These release notes are automatically extracted from the full changelog.
Features
- refine: Export covariance matrix and standard deviation for clock rate regression in the node data JSON output when these values are calculated by TreeTime. These new values appear in the
clock
data structure of the JSON output ascov
andrate_std
keys, respectively. #1284 (@huddlej)
Bug fixes
- clades: Fix outputs for genes named
NA
(previously the value was replaced bynan
). #1293 (@rneher) - distance: Improve documentation by describing how gaps get treated as indels and how users can ignore specific characters in distance calculations. #1285 (@huddlej)
- Fix help output compatibility with non-Unicode streams. #1290 (@victorlin)