Skip to content

developmentseed/atd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

atd

the Ability To Duplicate

Warning

This is a proof-of-concept that is not intended for production use.

See the docs for a step-by-step walkthrough of what's going on.

A gif of the thing working

Motivation

Inspired by a discussion around data duplication during the "Meet the Partners" segment of Team Week 2025, this repository allows you to, with one command:

  1. Copy all the geospatial assets from one blob storage to another using obstore
  2. Create a STAC item for each of those assets that includes a link back to original asset and a checksum so folks can verify that they're the same asset
  3. Create a single stac-geoparquet to hold all of those items

Usage

Install:

python -m pip install git+https://github.com/developmentseed/atd

Then:

# You could write to s3 (or some other blob storage) as well
$ atd s3://maxar-opendata/events/Marshall-Fire-21-Update/13/031131113030/2021-12-30 ~/Desktop
62.6 MB written to file:///Users/gadomski/Desktop
Items available at file:///Users/gadomski/Desktop/items.parquet 

There's two assets:

$ stacrs translate ~/Desktop/items.parquet | jq '.features.[0] | .assets'
{
  "original": {
    "href": "s3://maxar-opendata/events/Marshall-Fire-21-Update/13/031131113030/2021-12-30/10200100BCB1A500-pan.tif",
    "file:checksum": "12202f1ea332dd0e7a559b78e16952c5b9be81e44ddf9768634db12dcb311b3f587f"
  },
  "data": {
    "href": "file:///Users/gadomski/Desktop/10200100BCB1A500-pan.tif",
    "type": "image/tiff; application=geotiff",
    "roles": [
      "data"
    ],
    "eo:bands": [
      {
        "name": "b1",
        "description": "gray"
      }
    ],
    "file:checksum": "12202f1ea332dd0e7a559b78e16952c5b9be81e44ddf9768634db12dcb311b3f587f"
  }
}

You can use stacrs serve to browse them:

$ stacrs serve ~/Desktop/items.parquet                                  
Serving a STAC API at http://127.0.0.1:7822

Then go to https://radiantearth.github.io/stac-browser/#/external/http:/127.0.0.1:7822 to browse.

Limitations

  • There's no guards on the number of simultaneous downloads, so you could swamp yourself pretty easily
  • No configuration (yet)