From 4d26fd260b13026d6804c9fb8a2a07ebeea3da22 Mon Sep 17 00:00:00 2001 From: ahernank Date: Fri, 11 Oct 2024 08:20:41 -0500 Subject: [PATCH 1/5] add new cohorts --- _posts/2024-10-02-ag3-cohorts-v20240924.md | 44 ++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 _posts/2024-10-02-ag3-cohorts-v20240924.md diff --git a/_posts/2024-10-02-ag3-cohorts-v20240924.md b/_posts/2024-10-02-ag3-cohorts-v20240924.md new file mode 100644 index 0000000..7663cef --- /dev/null +++ b/_posts/2024-10-02-ag3-cohorts-v20240924.md @@ -0,0 +1,44 @@ +--- +layout: post +title: "Ag3 cohorts analysis version 20240924" +tags: data +--- + +A new cohorts analysis version `20240924` has been released for the +Ag3 data resource. This is now the default cohorts analysis version +when using the `malariagen_data` [Ag3 +API](https://malariagen.github.io/malariagen-data-python/latest/Ag3.html). This +cohorts analysis is available for datasets up to and including Ag3.11. + +Please note that the new cohorts analysis may change the values of +sample metadata columns including `taxon`, `admin1_iso`, +`admin1_name`, `admin2_name`, and derived columns beginning `cohorts_` +relative to previous cohorts analysis versions. + +To pin this cohorts analysis when accessing data: + +{% highlight python %} +import malariagen_data + +ag3 = malariagen_data.Ag3( + cohorts_analysis="20240924", +) +{% endhighlight %} + +This new version introduces some key changes: + +- Samples that were previously assigned as `gcx1`, have been renamed as `bissau`: + - `gcx` stands for `genetic cryptic species`, we use these labels as name placeholders for groups that fall outside our usual taxonomic assignment + - In line with [Caputo et al. (2024)](https://malariagen.github.io/vobs-updates/2024/09/10/caputo.html), which characterises the `gcx1` group, we have updated its proposed name to `Bissau molecular form` + - 291 samples that were previously assigned to the `gcx1` group, are now relabeled as `bissau`. + - 5 samples samples that were previously `unassigned`, are now relabeled as `bissau`. + - these changes also affect cohort names, e.g. `GM-M_gcx1_2019` has now been relabeled to `GM-M_biss_2019` + +- 36 samples that were previously `unassigned`, have been renamed as (32) `melas`, (2)`gambiae`, (1) `fontenillei`, (1) `arabiensis`. + +- An error on the administrative region 1 metadata has been fixed, affecting 119 samples. Tor these: + - `admin1_iso` has been relabeled from `UG-E` to `KE-04` + - `admin1_name` has been relabeled from `Eastern Region` to `Busia` + - these changes also affect cohort names, e.g. `UG-E_arab_2013` has now been relabeled to `KE-04_arab_2013` + +If you need to access the previous version of the cohorts analysis, you can use pin it using the code in [here](https://malariagen.github.io/vobs-updates/2024/07/24/ag3-cohorts-v20240717.html). From 1c5235b511a62e724e5a8ab126542315f5a05336 Mon Sep 17 00:00:00 2001 From: ahernank Date: Fri, 11 Oct 2024 08:32:19 -0500 Subject: [PATCH 2/5] rewrite for clarity --- _posts/2024-10-02-ag3-cohorts-v20240924.md | 26 +++++++++++----------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/_posts/2024-10-02-ag3-cohorts-v20240924.md b/_posts/2024-10-02-ag3-cohorts-v20240924.md index 7663cef..a117c01 100644 --- a/_posts/2024-10-02-ag3-cohorts-v20240924.md +++ b/_posts/2024-10-02-ag3-cohorts-v20240924.md @@ -27,18 +27,18 @@ ag3 = malariagen_data.Ag3( This new version introduces some key changes: -- Samples that were previously assigned as `gcx1`, have been renamed as `bissau`: - - `gcx` stands for `genetic cryptic species`, we use these labels as name placeholders for groups that fall outside our usual taxonomic assignment - - In line with [Caputo et al. (2024)](https://malariagen.github.io/vobs-updates/2024/09/10/caputo.html), which characterises the `gcx1` group, we have updated its proposed name to `Bissau molecular form` - - 291 samples that were previously assigned to the `gcx1` group, are now relabeled as `bissau`. - - 5 samples samples that were previously `unassigned`, are now relabeled as `bissau`. - - these changes also affect cohort names, e.g. `GM-M_gcx1_2019` has now been relabeled to `GM-M_biss_2019` - -- 36 samples that were previously `unassigned`, have been renamed as (32) `melas`, (2)`gambiae`, (1) `fontenillei`, (1) `arabiensis`. - -- An error on the administrative region 1 metadata has been fixed, affecting 119 samples. Tor these: - - `admin1_iso` has been relabeled from `UG-E` to `KE-04` - - `admin1_name` has been relabeled from `Eastern Region` to `Busia` - - these changes also affect cohort names, e.g. `UG-E_arab_2013` has now been relabeled to `KE-04_arab_2013` +- Samples previously labeled as `gcx1` in the `taxon` field have been renamed to `bissau`: + - `gcx` (`genetic cryptic species`) labels serve as placeholders for groups outside our usual taxonomic assignment + - Following [Caputo et al. (2024)](https://malariagen.github.io/vobs-updates/2024/09/10/caputo.html), the `gcx1` group has been renamed to `Bissau molecular form` + - 291 samples previously assigned as `gcx1`, are now labeled as `bissau`. + - 5 previously `unassigned` samples are also relabeled as `bissau`. + - Cohort names have been updated, e.g. `GM-M_gcx1_2019` is now `GM-M_biss_2019` + +- 36 `unassigned` samples have been reclassified as: 32 `melas`, 2`gambiae`, 1 `fontenillei`, 1 `arabiensis`. + +- A location metadata error affecting the administrative region (level 1) of 119 samples has been corrected: + - `admin1_iso` updated from `UG-E` to `KE-04` + - `admin1_name` updated from `Eastern Region` to `Busia` + - Cohort names have been updated, e.g. `UG-E_arab_2013` has now been relabeled to `KE-04_arab_2013` If you need to access the previous version of the cohorts analysis, you can use pin it using the code in [here](https://malariagen.github.io/vobs-updates/2024/07/24/ag3-cohorts-v20240717.html). From a909806eb1474df773d8e029a3ed81e6cfc94cb9 Mon Sep 17 00:00:00 2001 From: ahernank Date: Fri, 11 Oct 2024 08:34:29 -0500 Subject: [PATCH 3/5] typo --- _posts/2024-10-02-ag3-cohorts-v20240924.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-10-02-ag3-cohorts-v20240924.md b/_posts/2024-10-02-ag3-cohorts-v20240924.md index a117c01..65a5140 100644 --- a/_posts/2024-10-02-ag3-cohorts-v20240924.md +++ b/_posts/2024-10-02-ag3-cohorts-v20240924.md @@ -41,4 +41,4 @@ This new version introduces some key changes: - `admin1_name` updated from `Eastern Region` to `Busia` - Cohort names have been updated, e.g. `UG-E_arab_2013` has now been relabeled to `KE-04_arab_2013` -If you need to access the previous version of the cohorts analysis, you can use pin it using the code in [here](https://malariagen.github.io/vobs-updates/2024/07/24/ag3-cohorts-v20240717.html). +If you need to access the previous version of the cohorts analysis, you can pin it using the code in [here](https://malariagen.github.io/vobs-updates/2024/07/24/ag3-cohorts-v20240717.html). From 50915118c4ba066790e899155446ea3c53cf1140 Mon Sep 17 00:00:00 2001 From: ahernank Date: Fri, 11 Oct 2024 08:56:01 -0500 Subject: [PATCH 4/5] fix --- _posts/2024-10-02-ag3-cohorts-v20240924.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-10-02-ag3-cohorts-v20240924.md b/_posts/2024-10-02-ag3-cohorts-v20240924.md index 65a5140..d1ea9f3 100644 --- a/_posts/2024-10-02-ag3-cohorts-v20240924.md +++ b/_posts/2024-10-02-ag3-cohorts-v20240924.md @@ -28,7 +28,7 @@ ag3 = malariagen_data.Ag3( This new version introduces some key changes: - Samples previously labeled as `gcx1` in the `taxon` field have been renamed to `bissau`: - - `gcx` (`genetic cryptic species`) labels serve as placeholders for groups outside our usual taxonomic assignment + - `gcx` (`gambiae complex cryptic taxa`) labels serve as placeholders for groups outside our usual taxonomic assignment - Following [Caputo et al. (2024)](https://malariagen.github.io/vobs-updates/2024/09/10/caputo.html), the `gcx1` group has been renamed to `Bissau molecular form` - 291 samples previously assigned as `gcx1`, are now labeled as `bissau`. - 5 previously `unassigned` samples are also relabeled as `bissau`. From 3b0e365b46200d7da6fc3da05fff35c8cb7d1d5f Mon Sep 17 00:00:00 2001 From: Anastasia Hernandez-Koutoucheva Date: Mon, 14 Oct 2024 10:42:53 +0100 Subject: [PATCH 5/5] Apply suggestions from code review Co-authored-by: Alistair Miles --- _posts/2024-10-02-ag3-cohorts-v20240924.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2024-10-02-ag3-cohorts-v20240924.md b/_posts/2024-10-02-ag3-cohorts-v20240924.md index d1ea9f3..208f435 100644 --- a/_posts/2024-10-02-ag3-cohorts-v20240924.md +++ b/_posts/2024-10-02-ag3-cohorts-v20240924.md @@ -8,7 +8,7 @@ A new cohorts analysis version `20240924` has been released for the Ag3 data resource. This is now the default cohorts analysis version when using the `malariagen_data` [Ag3 API](https://malariagen.github.io/malariagen-data-python/latest/Ag3.html). This -cohorts analysis is available for datasets up to and including Ag3.11. +cohorts analysis will be available for datasets up to and including Ag3.13. Please note that the new cohorts analysis may change the values of sample metadata columns including `taxon`, `admin1_iso`, @@ -34,7 +34,7 @@ This new version introduces some key changes: - 5 previously `unassigned` samples are also relabeled as `bissau`. - Cohort names have been updated, e.g. `GM-M_gcx1_2019` is now `GM-M_biss_2019` -- 36 `unassigned` samples have been reclassified as: 32 `melas`, 2`gambiae`, 1 `fontenillei`, 1 `arabiensis`. +- 36 previously `unassigned` samples have been reclassified as: 32 `melas`, 2`gambiae`, 1 `fontenillei`, 1 `arabiensis`. - A location metadata error affecting the administrative region (level 1) of 119 samples has been corrected: - `admin1_iso` updated from `UG-E` to `KE-04`