Text2Video

Set up

Git clone repo

git clone [email protected]:sibozhang/Text2Video.git

Download and install modified vid2vid repo vid2vid
Download Trained model

Please build 'checkpoints' folder in vid2vid folder and put trained model in it.

VidTIMIT fadg0 (English, Female)

Dropbox: https://www.dropbox.com/sh/lk6et49v2uyfzjx/AADAFAp02_b3FQchaYxOZ0EMa?dl=0

百度云链接: https://pan.baidu.com/s/1SSkMKOK9LhClW2JvDCSiLg?pwd=bevj 提取码: bevj

Xuesong (Chinese, Male)

Dropbox: https://www.dropbox.com/sh/qz3zoma5ac9mw5p/AAARiR8xKvATN4CBSyjWt_uOa?dl=0

百度云链接: 链接: https://pan.baidu.com/s/1DvuBbThYo4n5RIZsc-92rg?pwd=am7d 提取码: am7d

Prepare data and folder in the following order

Text2Video
├── *phoneme_data
├── model
├── ...
vid2vid
├── ...
venv
├── vid2vid

Setup env

sudo apt-get install sox libsox-fmt-mp3
pip install zhon
pip install moviepy
pip install ffmpeg
pip install dominate
pip install pydub

For Chinese, we use vosk to get timestamp of each words. Please install vosk from https://alphacephei.com/vosk/install and unpack as 'model' in the current folder. or install:

pip install vosk
pip install cn2an
pip install pypinyin

Testing

Activate vitrual environment vid2vid

source ../venv/vid2vid/bin/activate

Generate video with real audio in English

sh text2video_audio.sh $1 $2

Generate video with TTS audio in English

sh text2video_tts.sh $1 $2 $3

Generate video with TTS audio in Chinese

sh text2video_tts.sh $1 $2 $3

$1: "input text" $2: person $3: fill f for female or m for male (gender)

Example 1. test VidTIMIT data with real audio.

sh text2video_audio.sh "She had your dark suit in greasy wash water all year." fadg0 f

Example 2. test VidTIMIT data with TTS audio.

sh text2video_tts.sh "She had your dark suit in greasy wash water all year." fadg0 f

Example 3. test with Chinese female TTS audio.

sh text2video_tts_chinese.sh "正在为您查询合肥的天气情况。今天是2020年2月24日，合肥市今天多云，最低温度9摄氏度，最高温度15摄氏度，微风。" henan f

Appendices

ARPABET

Ackowledgements

This code is based on the vid2vid framework.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
*phoneme_data/VidTIMIT		*phoneme_data/VidTIMIT
*pinyin_data		*pinyin_data
aligner		aligner
input		input
input_audio		input_audio
input_audio_real		input_audio_real
input_timestamp		input_timestamp
prompts		prompts
tools		tools
venv_vid2vid		venv_vid2vid
.gitignore		.gitignore
ARPABET.png		ARPABET.png
README.md		README.md
dict_henan.txt		dict_henan.txt
dict_xuesong.txt		dict_xuesong.txt
icassp2022_text2video_poster.pdf		icassp2022_text2video_poster.pdf
interp_landmarks_motion.py		interp_landmarks_motion.py
interp_landmarks_motion_phoneme_VidTIMIT_smooth.py		interp_landmarks_motion_phoneme_VidTIMIT_smooth.py
keypoint2img.py		keypoint2img.py
phoneme_timestamping.py		phoneme_timestamping.py
pinyin_timestamping.py		pinyin_timestamping.py
text2video_audio.sh		text2video_audio.sh
text2video_tts.sh		text2video_tts.sh
text2video_tts_chinese.sh		text2video_tts_chinese.sh
tts_request.py		tts_request.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Video

Set up

Testing

Appendices

Ackowledgements

About

Releases

Packages

Contributors 2

Languages

Harry-Miral/SpeakerVideoGenerationPart1

Folders and files

Latest commit

History

Repository files navigation

Text2Video

Set up

Testing

Appendices

Ackowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages