This is an official PyTorch Implementation of Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection.
This project runs on Linux (Ubuntu 22.04) with one GPU (~4G) and a large memory (~80G).
Install the following packages at first:
- python 3.9.2
- PyTorch 1.10.0
- torchvision 0.11.1
- torchmetrics 0.9.3
- pandas
- munch
- h5py
- vit_pytorch
- omegaconf
Commnads for preparing datasets can be found in preprocess.sh
.
For perform training process faster, we save visual features in .pkl
files. For example, the structure of ImageNet_shot.pkl
is as the following:
{"tt0000000":
{
"0000":array(),
"0001":array(),
···
},
"tt0000001":
{
···
},
...
}
There in, tt0000000
is a video's ID. For each video, the key 0000
indicates a shot's ID. Each shot is encoded as a feature vector of 2048-dim.
For convinient, labels are reformated and saved into .pkl
files, too.
shot_annotation.pkl
saves the indices of the first frame and the last frame for each shot.scene_annotation.pkl
saves the indices of the first shot and the last shot for each scene.label_dict.pkl
saves a list for each video, where each element in the list indicates whether a shot is the first shot of a scene or not.
Codes for generating the above files can be found in preprocess.ipynb
.
Commands for train&test can be seen in runner.sh
. Some ablations can be seen in runner2.sh
and runner3.sh
. Here we show some basic commands.
Pre-training:
python -m src.pretrain config/selfsup_best.yaml
Fine-tuning:
python -m src.finetune config/selfsup_best.yaml
Test:
python -m src.evaluate config/selfsup_best.yaml
- Codes for plotting data points can be seen in
show_log.ipynb
. - Codes for drawing heatmaps can be seen in
visualize.ipynb
.
Configuration files config/xxx.yaml
contains all the hyperparameters.
base
contains basic configuration. Among them,base.params.clip_len
indicates the number in each clip.base.path
includes file paths of formatted datasets and labels.model
is the basename of the Model code file.
pretrain
,finetune
andevaluate
correspond to two training stages and the testing stage.pretrain.params.label_percentage
specifies the percentage of data to use during pre-training.finetune.aim_index
specifies the index of films in OVSD/BBC for evaluation using leave-one-out method.finetune.load_path
specifies the path of the pre-trained model.finetune.train
is the basename of the Dataset code file.finetune.vid_list
indicates which subset of MovieNet to use.evaluate.head
specifies the prediction header to use.
@article{tan2024temporal,
title={Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection},
author={Tan, Jiawei and Yang, Pingan and Chen, Lu and Wang, Hongxing},
journal={ACM Transactions on Multimedia Computing, Communications and Applications},
year={2024},
publisher={ACM New York, NY}
}