Project Page | Paper | Data (Coming Soon)
⚔️ We are dedicated to enhancing and expanding the SynHOI dataset. We will release it soon, together with more powerful models for HICO-DET and V-COCO through SynHOI-Pretraining.
Installl the dependencies.
pip install -r requirements.txt
Clone and build CLIP.
git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..
Compiling CUDA operators for deformable attention.
cd models/DiffHOI_L/ops
python setup.py build install
cd ../../..
Download the checkpoint of Stable-Diffusion (we use v1-5 by default). Please also follow its instructions to install the required packages.
HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz
) to the data
directory.
Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.
data
└─ hico_20160224_det
|─ annotations
| |─ trainval_hico.json
| |─ test_hico.json
| └─ corre_hico.npy
:
First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json
. Next, download the prior file prior.pickle
from here. Place the files and make directories as follows.
DiffHOI
|─ data
│ └─ v-coco
| |─ data
| | |─ instances_vcoco_all_2014.json
| | :
| |─ prior.pickle
| |─ images
| | |─ train2014
| | | |─ COCO_train2014_000000000009.jpg
| | | :
| | └─ val2014
| | |─ COCO_val2014_000000000042.jpg
| | :
| |─ annotations
: :
The annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.
PYTHONPATH=data/v-coco \
python convert_vcoco_annotations.py \
--load_path data/v-coco/data \
--prior_path data/v-coco/prior.pickle \
--save_path data/v-coco/annotations
Note that only Python2 can be used for this conversion because vsrl_utils.py
in the v-coco repository shows a error with Python3.
V-COCO annotations with the HOIA format, corre_vcoco.npy
, test_vcoco.json
, and trainval_vcoco.json
will be generated to annotations
directory.
Download the pretrained model of DETR detector for ResNet50, and put it to the params
directory.
python ./tools/convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-2branch-hico.pth \
--num_queries 64
python ./tools/convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-2branch-vcoco.pth \
--dataset vcoco \
--num_queries 64
Download the pretrained model of Deformable DETR detector for Swin-L, and put it to the params
directory.
Full (D) | Rare (D) | Non-rare (D) | Full(KO) | Rare (KO) | Non-rare (KO) | Download | Conifg | |
---|---|---|---|---|---|---|---|---|
DiffHOI-S (R50) | 34.41 | 31.07 | 35.40 | 37.31 | 34.56 | 38.14 | model | config |
DiffHOI-L (Swin-L) | 40.63 | 38.10 | 41.38 | 43.14 | 40.24 | 44.01 | model | config |
After the preparation, you can start training with the following commands.
sh ./run/hico_s.sh
sh ./run/vcoco_s.sh
sh ./run/hico_s_zs_nf_uc.sh
sh ./run/hico_s_eval.sh
sh ./run/hico_l_eval.sh
Please consider citing our paper if it helps your research.
@article{yang2023boosting,
title={Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model},
author={Yang, Jie and Li, Bingliang and Yang, Fengyu and Zeng, Ailing and Zhang, Lei and Zhang, Ruimao},
journal={arXiv preprint arXiv:2305.12252},
year={2023}
}
This repo is mainly based on GEN-VLKT Licensed under MIT Copyright (c) [2022] [Yue Liao] , DINO under Apache 2.0 Copyright (c) [2022] [IDEA-Research]. We thank their well-organized code!