🏠[Project page] 📄[Arxiv] 🔥[New Dataset]
This repository contains code for paper GRES: Generalized Referring Expression Segmentation.
The code is tested under CUDA 11.8, Pytorch 1.11.0 and Detectron2 0.6.
- Install detectron2 following the manual
- Run
sh make.sh
undergres_model/modeling/pixel_decoder/ops
- Install other required packages:
pip -r requirements.txt
- Prepare the dataset following
datasets/DATASET.md
python train_net.py \
--config-file configs/referring_swin_base.yaml \
--num-gpus 8 --dist-url auto --eval-only \
MODEL.WEIGHTS [path_to_weights] \
OUTPUT_DIR [output_dir]
Firstly, download the backbone weights (swin_base_patch4_window12_384_22k.pkl
) and convert it into detectron2 format using the script:
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_base_patch4_window12_384_22k.pth swin_base_patch4_window12_384_22k.pkl
Then start training:
python train_net.py \
--config-file configs/referring_swin_base.yaml \
--num-gpus 8 --dist-url auto \
MODEL.WEIGHTS [path_to_weights] \
OUTPUT_DIR [path_to_weights]
Add your configs subsquently to customize options. For example:
SOLVER.IMS_PER_BATCH 48
SOLVER.BASE_LR 0.00001
For the full list of base configs, see configs/referring_R50.yaml
and configs/Base-COCO-InstanceSegmentation.yaml
This project is based on refer, maskformer, detectron2. Many thanks to the authors for their great works!
Please consider to cite GRES if it helps your research.
@inproceedings{GRES,
title={{GRES}: Generalized Referring Expression Segmentation},
author={Liu, Chang and Ding, Henghui and Jiang, Xudong},
booktitle={CVPR},
year={2023}
}