InstaGeo is geospatial deep learning Python package designed to facilitate geospatial machine learning using satellite imagery data from Harmonized Landsat and Sentinel-2 (HLS) Data Product and Prithvi geospatial foundational model. It consists of three core components: Data, Model, and Apps, each tailored to support various aspects of geospatial data retrieval, manipulation, preprocessing, model training, and inference serving.
- Data: Focuses on retrieving, manipulating, and processing Harmonized Landsat Sentinel-2 (HLS) data for classification and segmentation tasks such as disaster mapping, crop classification, and breeding ground prediction.
- Model: Centers around data loading, training, and evaluating models, particularly leveraging the Prithvi model for various modeling tasks. It includes a sliding-window feature that allows inference to be run on large inputs.
- Apps: Aims to operationalize models developed in the Model component for practical applications.
To get started with InstaGeo, ensure you have Python installed on your system. Then, execute the following command in your terminal or command prompt to install InstaGeo:
pip install git+https://github.com/instadeepai/InstaGeo-E2E-Geospatial-ML.git
This command will download and install the latest version of InstaGeo along with its required dependencies.
After installation, you may want to verify that InstaGeo has been correctly installed and is functioning as expected. To do this, run the included test suite with the following commands:
export PYTHONPATH=$PYTHONPATH:$(pwd)
pytest --verbose .
-
HLS Data Retrieval: InstaGeo efficiently searches for and download Sentinel-2 and Landsat 8/9 multi-spectral earth observation images from the HLS data product.
-
Create Chips and Segmentation Maps: InstaGeo breaks down large satellite image tiles into smaller, manageable patches (referred to as "chips") suitable for deep learning model training. It also generate segmentation maps, which serve as targets for training, by categorizing each pixel in the chips.
-
Training Custom Models: Utilize the Prithvi geospatial foundational model as a backbone to develop custom models tailored for precise geospatial applications. These applications include, but are not limited to, flood mapping for emergency response planning, crop classification for agricultural management, and locust breeding ground prediction to address food security.
-
Inference on Large-scale Geospatial Data: Perform inference using the models that have been trained on 'chips' (typically measuring 224 x 224 pixels) on expansive HLS tiles, which measure 3660 x 3660 pixels.
- Operationalize Models: Once data has been created and model trained, deploy model for use using the Apps components. HLS tile predictions can be overlaid and visualized on interactive maps.
See InstGeo_Demo notebook for an end-to-end demo.
mkdir locust_breeding
gsutil -m cp -r gs://instageo/data/locust_breeding/records locust_breeding
Note: Ensure that you have up to 1.5TB free disk space
Create output directory for each split
mkdir locust_breeding/train locust_breeding/val locust_breeding/test
- Train Split
python -m "instageo.data.chip_creator" \
--dataframe_path="locust_breeding/records/train.csv" \
--output_directory="locust_breeding/train" \
--min_count=1 \
--chip_size=224 \
--no_data_value=-1 \
--temporal_tolerance=3 \
--temporal_step=30 \
--mask_cloud=False \
--num_steps=3
- Validation Split
python -m "instageo.data.chip_creator" \
--dataframe_path="locust_breeding/records/val.csv" \
--output_directory="locust_breeding/val" \
--min_count=1 \
--chip_size=224 \
--no_data_value=-1 \
--temporal_tolerance=3 \
--temporal_step=30 \
--mask_cloud=False \
--num_steps=3
- Test Split
python -m "instageo.data.chip_creator" \
--dataframe_path="locust_breeding/records/test.csv" \
--output_directory="locust_breeding/test" \
--min_count=1 \
--chip_size=224 \
--no_data_value=-1 \
--temporal_tolerance=3 \
--temporal_step=30 \
--mask_cloud=False \
--num_steps=3
Before launching training, modify the path to chips and segmentation maps in each split
for split in ["train", "val", "test"]:
root_dir = "locust_breeding"
chips = [
chip.replace("chip", f"{split}/chips/chip")
for chip in os.listdir(os.path.join(root_dir, f"{split}/chips"))
]
seg_maps = [
chip.replace("chip", f"{split}/seg_maps/seg_map") for chip in chips_orig
]
df = pd.DataFrame({"Input": chips, "Label": seg_maps})
df.to_csv(os.path.join(root_dir, f"{split}.csv"))
python -m instageo.model.run --config-name=locust \
root_dir='locust_breeding' \
train_filepath="locust_breeding/train.csv" \
valid_filepath="locust_breeding/val.csv"
python -m instageo.model.run --config-name=locust \
root_dir='locust_breeding' \
test_filepath="locust_breeding/test.csv" \
train.batch_size=16 \
checkpoint_path='instageo-data/outputs/2024-03-01/09-16-30/instageo_epoch-10-val_iou-0.70.ckpt' \
mode=eval
- Download HLS tiles
python -m "instageo.data.chip_creator" \
--dataframe_path="gs://instageo/utils/africa_prediction_template.csv" \
--output_directory="inference/20223-01" \
--min_count=1 \
--no_data_value=-1 \
--temporal_tolerance=3 \
--temporal_step=30 \
--num_steps=3 \
--download_only
- Inference
python -m instageo.model.run --config-name=locust \
root_dir='inference/20223-01' \
test_filepath='hls_dataset.json' \
train.batch_size=16 \
checkpoint_path='instageo-data/outputs/2024-03-01/09-16-30/instageo_epoch-10-val_iou-0.70.ckpt' \
mode=predict
- Run InstaGeo Serve
cd instageo/apps
streamlit run app.py
- Specify the directory containing the predictions.
We welcome contributions to InstaGeo. Please follow the contribution guidelines for submitting pull requests and reporting issues to help us improve the package.