From our CVPR-W 2021 paper titled: LEGAN: Disentangled Manipulation of Directional Lighting and Facial Expressions whilst Leveraging Human Perceptual Judgements
Building facial analysis systems that generalize to extreme variations in lighting and facial expressions is a challenging problem that can potentially be alleviated using natural-looking synthetic data. Towards that, we propose LEGAN, a novel synthesis framework that leverages perceptual quality judgments for jointly manipulating lighting and expressions in face images, without requiring paired training data. LEGAN disentangles the lighting and expression subspaces and performs transformations in the feature space before upscaling to the desired output image. The fidelity of the synthetic image is further refined by integrating a perceptual quality estimation model, trained with face images rendered using multiple synthesis methods and their crowd-sourced naturalness ratings, into the LEGAN framework as an auxiliary discriminator. Using objective metrics like FID and LPIPS, LEGAN is shown to generate higher quality face images when compared with popular GAN models like StarGAN and StarGAN-v2 for lighting and expression synthesis. We also conduct a perceptual study using images synthesized by LEGAN and other GAN models and show the correlation between our quality estimation and visual fidelity. Finally, we demonstrate the effectiveness of LEGAN as training data augmenter for expression recognition and face verification tasks.
Synthetic face images generated by 5 different approaches and their human-annotated perceptual labels (i.e. mean and standard deviation of naturalness score of each image). These same images were used to train the perceptual quality estimation model used in LEGAN and can be further utilized to train similar quality models.
-
StyleGAN: Shared by generated.photos (https://generated.photos/), they specifically trained StyleGAN on an exclusive consistent set of images to get high quality results. These photos were acquired in their own studio with a controlled environment and spent a lot of time training and re-training the model to get the high quality results.
-
ProGAN: 10,000 synthetic face images of non-existent subjects are generated by training NVIDIA’s progressively growing GAN model on the CelebA-HQ dataset.
-
FaceForensics++: We use a randomly sampled set of 1000 frames, consisting of 1000 video sequences that have been manipulated with four automated face manipulation methods (https://github.com/ondyari/FaceForensics/).
-
DeepFake: We use a bunch of sampled frames from 620 manipulated videos of 43 actors from https://www.kaggle.com/c/deepfake-detection-challenge.
-
The Notre Dame Synthetic Face Dataset: We use a randomly sampled set of 163K face images, from the available 2M (https://cvrl.nd.edu/projects/data/), of synthetic subjects generated by wrapping 2D synthetic textures [10] on to ‘best-fitting’ 3D models. Since this dataset does not permit re-distribution in any form, we only share the image names from the original dataset here.
Note: After filtering based on gender, facial pose and coarse image quality, we use 37K face images from this set for our experiments. Only near-frontal face images (|yaw| < 15 degrees) are used for this study, as visually gauging the naturalness of non-frontal faces is a harder task.
Naturalness Labeling: The synthetic face images are annotated by human labelers for naturalness, on the Amazon Mechanical Turk (AMT), on a scale of 1 - 10. A rating of 1 suggests the image looks unnatural or fake, while a rating of 10 suggests hyper-realism. Any synthetic face with a rating >= 5 has successfully fooled the annotator. To further enforce the subjectiveness of human perception, we allow three different annotators to rate each synthetic image. The mean (m) and standard deviation (s) of these three ratings are used as perceptual labels for the associated image.
The dataset (along with the full license) can be accessed here: https://download.affectiva.com/datasets/LEGAN_Perpetual/LEGAN_Perceptual_Dataset.tar.gz
For any questions please email [email protected] or [email protected] or [email protected].
Affectiva grants to the Licensee a non-exclusive, non-transferable, royalty-free, revocable license to use, reproduce, and redistribute the Dataset and create derivative works from the Dataset for academic and research purposes only and not for commercial purposes. Any redistribution must include this License and any applicable Third Party Terms. Affectiva retains ownership of all intellectual property rights in the Dataset (other than Third Party Materials which are owned by their respective owners). Licensee may not use the Dataset in any manner which would be unlawful, would violate the rights of third parties, or would otherwise create any criminal or civil liability. Licensee may not export the Dataset in violation of any applicable export control or anti-terrorism laws or regulations.
@InProceedings{Banerjee_2021_CVPR,
author = {Banerjee, Sandipan and Joshi, Ajjen and Mahajan, Prashant and Bhattacharya, Sneha and Kyal, Survi and Mishra, Taniya},
title = {LEGAN: Disentangled Manipulation of Directional Lighting and Facial Expressions Whilst Leveraging Human Perceptual Judgements},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2021},
pages = {1531-1541}
}