Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection

This is an official PyTorch Implementation of Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection.

Environment

This project runs on Linux (Ubuntu 22.04) with one GPU (~4G) and a large memory (~80G).

Install the following packages at first:

python 3.9.2
PyTorch 1.10.0
torchvision 0.11.1
torchmetrics 0.9.3
pandas
munch
h5py
vit_pytorch
omegaconf

Prepare datasets

Commnads for preparing datasets can be found in preprocess.sh.

Extract visual features

For perform training process faster, we save visual features in .pkl files. For example, the structure of ImageNet_shot.pkl is as the following:

{"tt0000000":
    {
        "0000":array(),
        "0001":array(),
        ···
    },
"tt0000001":
    {
        ···
    },
...
}

There in, tt0000000 is a video's ID. For each video, the key 0000 indicates a shot's ID. Each shot is encoded as a feature vector of 2048-dim.

Labels for datasets

For convinient, labels are reformated and saved into .pkl files, too.

shot_annotation.pkl saves the indices of the first frame and the last frame for each shot.
scene_annotation.pkl saves the indices of the first shot and the last shot for each scene.
label_dict.pkl saves a list for each video, where each element in the list indicates whether a shot is the first shot of a scene or not.

Codes for generating the above files can be found in preprocess.ipynb.

Train & Test

Commands for train&test can be seen in runner.sh. Some ablations can be seen in runner2.sh and runner3.sh. Here we show some basic commands.

Pre-training:

python -m src.pretrain config/selfsup_best.yaml

Fine-tuning:

python -m src.finetune config/selfsup_best.yaml

Test:

python -m src.evaluate config/selfsup_best.yaml

Visualization

Codes for plotting data points can be seen in show_log.ipynb.
Codes for drawing heatmaps can be seen in visualize.ipynb.

Parameters

Configuration files config/xxx.yaml contains all the hyperparameters.

base contains basic configuration. Among them,
- base.params.clip_len indicates the number in each clip.
- base.path includes file paths of formatted datasets and labels.
- model is the basename of the Model code file.
pretrain, finetune and evaluate correspond to two training stages and the testing stage.
- pretrain.params.label_percentage specifies the percentage of data to use during pre-training.
- finetune.aim_index specifies the index of films in OVSD/BBC for evaluation using leave-one-out method.
- finetune.load_path specifies the path of the pre-trained model.
- finetune.train is the basename of the Dataset code file.
- finetune.vid_list indicates which subset of MovieNet to use.
- evaluate.head specifies the prediction header to use.

Quote

@article{tan2024temporal,
  title={Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection},
  author={Tan, Jiawei and Yang, Pingan and Chen, Lu and Wang, Hongxing},
  journal={ACM Transactions on Multimedia Computing, Communications and Applications},
  year={2024},
  publisher={ACM New York, NY}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection

Environment

Prepare datasets

Extract visual features

Labels for datasets

Train & Test

Visualization

Parameters

Quote

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
src		src
tool/extract		tool/extract
README.md		README.md
preprocess.ipynb		preprocess.ipynb
preprocess.sh		preprocess.sh
runner.sh		runner.sh
runner2.sh		runner2.sh
runner3.sh		runner3.sh
show_log.ipynb		show_log.ipynb
visualize.ipynb		visualize.ipynb

mini-mind/VSMBD

Folders and files

Latest commit

History

Repository files navigation

Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection

Environment

Prepare datasets

Extract visual features

Labels for datasets

Train & Test

Visualization

Parameters

Quote

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages