This program helps you find moments where a phrase was said in a video. It does this by parsing vtt subtitle files (which can be generated by Whisper AI!) and capturing screenshots at the subtitle timestamps using OpenCV.
Via Podman (recommended)
podman build -t subtitle_parser:latest .
Via Virtual Environment:
- Create a python virtual environment
pip install -r requirements.txt
Via Podman (recommended)
podman run --rm -it -v <path containing your video and subtitle files>:/data:z --name=sparser localhost/subtitle-parser:latest
CLI options can be appended directly after that line, for example -h
for usage help.
Via Virtual Environment:
python3 ./subtitle_parser.py
Here is an example file layout (what Whisper AI typically outputs):
.
├── IMG_0733.MOV
├── IMG_0733.MOV.srt
├── IMG_0733.MOV.txt
├── IMG_0733.MOV.vtt
└── subtitle_parse.py
We care about the plain .MOV video file and the .vtt subtitle file.
With your file locations and names in mind, run the program with -h
to see usage information.
This project came about because Willard wanted to document Potter's solar setup. They recorded a 40+ minute long video of Potter explaining the setup, however Potter would often say "this" and point to a component without saying what the component was. So it was hard to parse through the subtitles as documentation with the ambiguous terms, not to mention having to scrub through the video to find the right instance of the word "this". Therefore, in the wildnerness of the Adirondacks, this admittedly not super useful program was born.