Merge branch 'main' of https://github.com/Meta-Portrait/MetaPortrait …

…into main
Meta-Portrait · Apr 21, 2023 · c9bcf11 · c9bcf11
2 parents 4c259cd + 87b13b3
commit c9bcf11
Show file tree

Hide file tree

Showing 11 changed files with 1,206 additions and 24 deletions.
diff --git a/README.md b/README.md
@@ -14,9 +14,8 @@ By [Bowen Zhang](http://home.ustc.edu.cn/~zhangbowen)\*, [Chenyang Qi](https://c
 
 ## Todo
 
-- [x] Release the inference code of base model and temporal super-resolution mode
-  - [ ] Code refactoring and upte README for super-resolution model
-- [ ] Release the training code of base model
+- [x] Release the inference code of base model and temporal super-resolution model
+- [x] Release the training code of base model
 - [ ] Release the training code of super-resolution model
 
 ## Setup Environment
@@ -28,7 +27,9 @@ conda env create -f environment.yml
 conda activate meta_portrait_base
 ```
 
-## Inference Base Model
+## Base Model
+
+### Inference Base Model
 
 Download the [checkpoint of base model](https://drive.google.com/file/d/1Kmdv3w6N_we7W7lIt6LBzqRHwwy1dBxD/view?usp=share_link) and put it to `base_model/checkpoint`. We provide [preprocessed example data for inference](https://drive.google.com/file/d/166eNbabM6TeJVy7hxol2gL1kUGKHi3Do/view?usp=share_link), you could download the data, unzip and put it to `data`. The directory structure should like this:
 
@@ -57,26 +58,35 @@ cd base_model
 python inference.py --save_dir /path/to/output --config config/meta_portrait_256_eval.yaml --ckpt checkpoint/ckpt_base.pth.tar
 ```
 
-## Citing MetaPortrait
+### Train Base Model from Scratch
 
+Train the warping network first using the following command:
+```bash
+cd base_model
+python main.py --config config/meta_portrait_256_pretrain_warp.yaml --fp16 --stage Warp --task Pretrain
 ```
-@misc{zhang2022metaportrait,
-      title={MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation}, 
-      author={Bowen Zhang and Chenyang Qi and Pan Zhang and Bo Zhang and HsiangTao Wu and Dong Chen and Qifeng Chen and Yong Wang and Fang Wen},
-      year={2022},
-      eprint={2212.08062},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
-}
+
+Then, modify the path of `warp_ckpt` in `config/meta_portrait_256_pretrain_full.yaml` and joint train the warping network and refinement network using the following command:
+```bash
+python main.py --config config/meta_portrait_256_pretrain_full.yaml --fp16 --stage Full --task Pretrain
+```
+
+### Meta Training for Faster Personalization of Base Model
+
+You could start from the standard pretrained checkpoint and further optimize the personalized adaptation speed of the model by utilizing meta-learning using the following command:
+```bash
+python main.py --config config/meta_portrait_256_pretrain_meta_train.yaml --fp16 --stage Full --task Meta --remove_sn --ckpt /path/to/standard_pretrain_ckpt
 ```
-## Inference Temporal Super-resolution Model
 
-###  Base Environment
+## Temporal Super-resolution Model
+
+### Base Environment
 
 - Python >= 3.7 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
 - [PyTorch >= 1.7](https://pytorch.org/)
 - System: Linux + NVIDIA GPU + [CUDA](https://developer.nvidia.com/cuda-downloads)
 - Set the root path to [sr_model](sr_model)
+
 ### Data and checkpoint
 
 Download the [dataset](
@@ -102,7 +112,6 @@ options
 
 ### Installation Bash command
 
-
 ```bash
 pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
 # Install a modified basicsr - https://github.com/xinntao/BasicSR
@@ -120,14 +129,28 @@ python setup.py develop
 ```
 
 ### Quick Inference
+
 ckpt for inference: pretrained_ckpt/temporal_gfpgan.pth
 
 Example code to conduct face temporal super-resolution:
 
 ```bash
-CUDA_VISIBLE_DEVICES=7 python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 Experimental_root/test.py -opt options/test/same_id.yml --launcher pytorch
+python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 Experimental_root/test.py -opt options/test/same_id.yml --launcher pytorch
+```
+You may adjust the ```nproc_per_node``` to the number of GPUs on your own machine.
+
+## Citing MetaPortrait
+
+```
+@misc{zhang2022metaportrait,
+      title={MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation}, 
+      author={Bowen Zhang and Chenyang Qi and Pan Zhang and Bo Zhang and HsiangTao Wu and Dong Chen and Qifeng Chen and Yong Wang and Fang Wen},
+      year={2022},
+      eprint={2212.08062},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
 ```
-You may adjust the ```CUDA_VISIBLE_DEVICES``` and ```nproc_per_node``` to the number of GPUs on your own machine.
 
 ## Acknowledgements
 

diff --git a/base_model/config/meta_portrait_256_meta_train.yaml b/base_model/config/meta_portrait_256_meta_train.yaml
@@ -0,0 +1,97 @@
+general:
+  exp_name: meta_portrait_meta_train
+  random_seed: 365
+
+dataset:
+  frame_shape: [256, 256, 3]
+  eye_enhance: True
+  mouth_enhance: True
+  ldmkimg: True
+  ldmk_idx: [521, 505, 338, 398, 347, 35, 191, 30, 32, 207, 630, 629, 319, 4, 541, 61, 637, 660, 638, 587, 273, 590, 269, 432, 118,327, 12, 373, 58, 619, 466, 469, 464, 308, 152, 305, 150, 411, 635, 634, 564, 250, 443, 129, 364, 322, 49, 7, 361, 105, 434, 120, 500, 186, 575, 261, 636, 74]
+
+  train_data: [meta]
+  train_data_weight: [1]
+
+  meta:
+    root: ../data/   
+    crop_expand: 1.3
+    crop_offset_y: 0.2
+
+model:
+  arch: 'SPADEID'
+  # warp_ckpt: /mnt/blob/projects/IMmeeting/amlt-results/Meeting_exp_25/Orig_RegMotion_Ladder256_VoxCeleb2_Warp_SPADEInit_FeatureNorm_Bs48_Baseline_15eps_256/results/ckpt/spade/ckpt_15_2022-08-27-00-10-08.pth.tar
+  common:
+    num_channels: 3
+
+  kp_detector:
+    temperature: 0.1
+    block_expansion: 32
+    max_features: 1024
+    scale_factor: 0.25
+    num_blocks: 5
+  generator:
+    block_expansion: 64
+    max_features: 512
+    with_gaze_htmap: True
+    with_mouth_line: True
+    with_ldmk_line: True
+    use_IN: True
+    ladder: 
+      need_feat: False
+      use_mask: False
+      label_nc: 0
+      z_dim: 512
+    dense_motion_params:
+      label_nc: 0
+      ldmkimg: True
+      occlusion: True
+      block_expansion: 64
+      max_features: 1024
+      num_blocks: 5
+      dec_lease: 2
+      Lwarp: True
+      AdaINc: 512
+  discriminator:
+    scales: [1]
+    block_expansion: 32
+    max_features: 512
+    num_blocks: 4
+    use_kp: False
+
+train:
+  epochs: 5
+  batch_size: 0
+  dataset_repeats: 1
+
+  epoch_milestones: [2]
+  lr_generator: 2.0e-5
+  lr_discriminator: 2.0e-5
+  lr_kp_detector: 2.0e-5
+  warplr_tune: 0.1
+  outer_beta_1: 0.5
+  outer_beta_2: 0.999
+  inner_lr_generator: 2.0e-4
+  inner_lr_discriminator: 2.0e-4
+  inner_warplr_tune: 0.1
+  inner_beta_1: 0.5
+  inner_beta_2: 0.999
+
+  scales: [1, 0.5, 0.25, 0.125]
+
+  loss_weights:
+    generator_gan: 0
+    discriminator_gan: 1
+    feature_matching: [10, 10, 10, 10]
+    perceptual: [10, 10, 10, 10, 10]
+    id: 20
+    eye_enhance: 50
+    mouth_enhance: 50
+
+  tensorboard: True
+  event_save_path: ./results/events/
+  event_save_freq: 50
+
+  ckpt_save_path: ./results/ckpt/
+  ckpt_save_iter_freq: 500
+  ckpt_save_freq: 1
+  print_freq: 50
diff --git a/base_model/config/meta_portrait_256_pretrain_full.yaml b/base_model/config/meta_portrait_256_pretrain_full.yaml
@@ -0,0 +1,90 @@
+general:
+  exp_name: meta_portrait_base
+  random_seed: 365
+
+dataset:
+  frame_shape: [256, 256, 3]
+  eye_enhance: True
+  mouth_enhance: True
+  ldmkimg: True
+  ldmk_idx: [521, 505, 338, 398, 347, 35, 191, 30, 32, 207, 630, 629, 319, 4, 541, 61, 637, 660, 638, 587, 273, 590, 269, 432, 118,327, 12, 373, 58, 619, 466, 469, 464, 308, 152, 305, 150, 411, 635, 634, 564, 250, 443, 129, 364, 322, 49, 7, 361, 105, 434, 120, 500, 186, 575, 261, 636, 74]
+
+  train_data: [personalized]
+  train_data_weight: [1]
+
+  personalized:
+    root: ../data/   
+    crop_expand: 1.3
+    crop_offset_y: 0.2
+    static_bbox: True
+
+model:
+  arch: 'SPADEID'
+  warp_ckpt: /path/to/warp_ckpt
+  common:
+    num_channels: 3
+
+  kp_detector:
+    temperature: 0.1
+    block_expansion: 32
+    max_features: 1024
+    scale_factor: 0.25
+    num_blocks: 5
+  generator:
+    block_expansion: 64
+    max_features: 512
+    with_gaze_htmap: True
+    with_mouth_line: True
+    with_ldmk_line: True
+    use_IN: True
+    ladder: 
+      need_feat: False
+      use_mask: False
+      label_nc: 0
+      z_dim: 512
+    dense_motion_params:
+      label_nc: 0
+      ldmkimg: True
+      occlusion: True
+      block_expansion: 64
+      max_features: 1024
+      num_blocks: 5
+      dec_lease: 2
+      Lwarp: True
+      AdaINc: 512
+  discriminator:
+    scales: [1]
+    block_expansion: 32
+    max_features: 512
+    num_blocks: 4
+    use_kp: False
+
+train:
+  epochs: 60
+  batch_size: 2
+  dataset_repeats: 1
+
+  epoch_milestones: [45]
+  lr_generator: 2.0e-4
+  lr_discriminator: 2.0e-4
+  warplr_tune: 0.1
+
+  scales: [1, 0.5, 0.25, 0.125]
+
+  loss_weights:
+    generator_gan: 1
+    discriminator_gan: 1
+    feature_matching: [10, 10, 10, 10]
+    perceptual: [10, 10, 10, 10, 10]
+    id: 20
+    eye_enhance: 50
+    mouth_enhance: 50
+
+  tensorboard: True
+  event_save_path: ./results/events/
+  event_save_freq: 500
+
+  ckpt_save_path: ./results/ckpt/
+  ckpt_save_iter_freq: 5000
+  ckpt_save_freq: 1
+  print_freq: 1000
diff --git a/base_model/config/meta_portrait_256_pretrain_warp.yaml b/base_model/config/meta_portrait_256_pretrain_warp.yaml
@@ -0,0 +1,91 @@
+general:
+  exp_name: meta_portrait_base
+  random_seed: 365
+
+dataset:
+  frame_shape: [256, 256, 3]
+  eye_enhance: True
+  mouth_enhance: True
+  ldmkimg: True
+  ldmk_idx: [521, 505, 338, 398, 347, 35, 191, 30, 32, 207, 630, 629, 319, 4, 541, 61, 637, 660, 638, 587, 273, 590, 269, 432, 118,327, 12, 373, 58, 619, 466, 469, 464, 308, 152, 305, 150, 411, 635, 634, 564, 250, 443, 129, 364, 322, 49, 7, 361, 105, 434, 120, 500, 186, 575, 261, 636, 74]
+
+  train_data: [personalized]
+  train_data_weight: [1]
+
+  personalized:
+    root: ../data/   
+    crop_expand: 1.3
+    crop_offset_y: 0.2
+    static_bbox: True
+
+model:
+  arch: 'SPADEID'
+  common:
+    num_channels: 3
+
+  kp_detector:
+    temperature: 0.1
+    block_expansion: 32
+    max_features: 1024
+    scale_factor: 0.25
+    num_blocks: 5
+  generator:
+    block_expansion: 64
+    max_features: 512
+    with_gaze_htmap: True
+    with_mouth_line: True
+    with_ldmk_line: True
+    use_IN: True
+    ladder: 
+      need_feat: False
+      use_mask: False
+      label_nc: 0
+      z_dim: 512
+    dense_motion_params:
+      label_nc: 0
+      ldmkimg: True
+      occlusion: True
+      block_expansion: 64
+      max_features: 1024
+      num_blocks: 5
+      dec_lease: 2
+      Lwarp: True
+      AdaINc: 512
+  discriminator:
+    scales: [1]
+    block_expansion: 32
+    max_features: 512
+    num_blocks: 4
+    use_kp: False
+
+train:
+  epochs: 60
+  batch_size: 2
+  dataset_repeats: 1
+
+  epoch_milestones: [45]
+  lr_generator: 2.0e-4
+  lr_discriminator: 2.0e-4
+  warplr_tune: 0.1
+
+  scales: [1, 0.5, 0.25, 0.125]
+
+  loss_weights:
+    generator_gan: 1
+    discriminator_gan: 1
+    feature_matching: [10, 10, 10, 10]
+    perceptual: [10, 10, 10, 10, 10]
+    id: 20
+    eye_enhance: 50
+    mouth_enhance: 50
+
+  tensorboard: True
+  event_save_path: ./results/events/
+  event_save_freq: 500
+
+  ckpt_save_path: ./results/ckpt/
+  ckpt_save_iter_freq: 5000
+  ckpt_save_freq: 1
+  print_freq: 1000
+
+  eval_freq: 10000