Skip to content

Commit

Permalink
Merge pull request #35 from malariagen/GH32-add-ag-cohorts-20240924
Browse files Browse the repository at this point in the history
Add Ag cohorts analysis 20240924
  • Loading branch information
ahernank authored Oct 14, 2024
2 parents 4b40807 + 3b0e365 commit f13fa12
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions _posts/2024-10-02-ag3-cohorts-v20240924.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
layout: post
title: "Ag3 cohorts analysis version 20240924"
tags: data
---

A new cohorts analysis version `20240924` has been released for the
Ag3 data resource. This is now the default cohorts analysis version
when using the `malariagen_data` [Ag3
API](https://malariagen.github.io/malariagen-data-python/latest/Ag3.html). This
cohorts analysis will be available for datasets up to and including Ag3.13.

Please note that the new cohorts analysis may change the values of
sample metadata columns including `taxon`, `admin1_iso`,
`admin1_name`, `admin2_name`, and derived columns beginning `cohorts_`
relative to previous cohorts analysis versions.

To pin this cohorts analysis when accessing data:

{% highlight python %}
import malariagen_data

ag3 = malariagen_data.Ag3(
cohorts_analysis="20240924",
)
{% endhighlight %}

This new version introduces some key changes:

- Samples previously labeled as `gcx1` in the `taxon` field have been renamed to `bissau`:
- `gcx` (`gambiae complex cryptic taxa`) labels serve as placeholders for groups outside our usual taxonomic assignment
- Following [Caputo et al. (2024)](https://malariagen.github.io/vobs-updates/2024/09/10/caputo.html), the `gcx1` group has been renamed to `Bissau molecular form`
- 291 samples previously assigned as `gcx1`, are now labeled as `bissau`.
- 5 previously `unassigned` samples are also relabeled as `bissau`.
- Cohort names have been updated, e.g. `GM-M_gcx1_2019` is now `GM-M_biss_2019`

- 36 previously `unassigned` samples have been reclassified as: 32 `melas`, 2`gambiae`, 1 `fontenillei`, 1 `arabiensis`.

- A location metadata error affecting the administrative region (level 1) of 119 samples has been corrected:
- `admin1_iso` updated from `UG-E` to `KE-04`
- `admin1_name` updated from `Eastern Region` to `Busia`
- Cohort names have been updated, e.g. `UG-E_arab_2013` has now been relabeled to `KE-04_arab_2013`

If you need to access the previous version of the cohorts analysis, you can pin it using the code in [here](https://malariagen.github.io/vobs-updates/2024/07/24/ag3-cohorts-v20240717.html).

0 comments on commit f13fa12

Please sign in to comment.