This repo contains the code and information for the paper "Characteristics and prevalence of fake social media profiles with AI-generated faces".
We analyze the fake accounts using GAN-generated images as their profiles on Twitter.
The ganed
package implements the GANEyeDistance
metric proposed in our paper.
The metric can help detect GAN-generated profiles on Twitter.
The package is not on PyPI. Assuming you are under the root directory of this project, you can use the following command:
pip install -e ./
This would install the package locally to your current Python environment. It will also install the following dependencies:
face_recognition>=1.3.0
pillow
numpy
Note that we only tested the package under Python 3.
Using the package is straightforward:
import ganed
ganed_calc = ganed.GANEyeDistance()
# Assuming image_path is a path to an image on your disk
ganed_result = ganed_calc.calculate_distance(path_to_image=image_path)
# Assuming pil_image is a PIL.Image.Image instance
ganed_result = ganed_calc.calculate_distance(pil_image=pil_image)
Applying the package to an input image would yield a GANEyeDistance
value between 0 and 1.
A value close to 0 indicates that the eye locations of the input image are close to the expected locations of GAN-generated faces.
According to our experiment, using a threshold of 0.02 leads to a recall of over 99.5% for GAN-generated faces. However, false positives are inevitable. So, additional examinations are necessary to determine the true nature of the images labeled as positive.
We released the TwitterGAN
dataset collected for our study.
The dataset contains 1,420 fake accounts with GAN-generated profiles.
We share their recent tweets and their profile images.
You can download the files from Zenodo.
TwitterGAN_tweets.ndjson.gz
: User objects and recent tweets of theTwitterGAN
accounts, collected using Twitter's V2 API. Each line is a JSON object containing the information for one account.TwitterGAN_GPT_tweets.ndjson.gz
: User objects and recent tweets of thechatgpt
sub-dataset ofTwitterGAN
. Note that this dataset was collected by parsing Twitter's webpage, the data structure is different from that of the API.TwitterGAN_profiles.tar.gz
: Profile images for the accounts.TwitterGAN_id_label_mapping.csv
: Mapping between user IDs, labels, and the file names of the profile images.
We also released the basic information of accounts in RandomTwitter
.
Their profile images are publicly accessible.
RandomTwitter_id_ganed.csv.gz
: User IDs, profile image URLs, and the GANEyeDistance values for accounts inRandomTwitter
@article{yang2024characteristics,
title={Characteristics and Prevalence of Fake Social Media Profiles with AI-generated Faces},
volume={2},
DOI={10.54501/jots.v2i4.197},
number={4},
journal={Journal of Online Trust and Safety},
author={Yang, Kai-Cheng and Singh, Danishjeet and Menczer, Filippo},
year={2024},
month={Sep.}
}