Paper • Website • Video • Dataset • Citation
This is the official implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21 link to paper, link to our website, link to our dataset.
Existing reference-based phishing detectors:
- ❌ Lack of interpretability
- ❌ Lack of generalization performance in the wild
- ❌ Lack of a large-scale phishing benchmark dataset
The contributions of our paper:
- ✅ We propose a phishing identification system Phishpedia, which has high identification accuracy and low runtime overhead, outperforming the relevant state-of-the-art identification approaches.
- ✅ Our system provides explainable annotations which increase users' confidence in model prediction
- ✅ We conducted a phishing discovery experiment on emerging domains fed from CertStream and discovered 1,704 real phishing, out of which 1133 are zero-days
: A URL and its screenshot Output
: Phish/Benign, Phishing target
Step 1: Enter Deep Object Detection Model, get predicted logos and inputs (inputs are not used for later prediction, just for explanation)
Step 2: Enter Deep Siamese Model
- If Siamese report no target,
Return Benign, None
- Else Siamese report a target,
Return Phish, Phishing target
- If Siamese report no target,
- src
- adv_attack: adversarial attacking scripts
- detectron2_pedia: training script for object detector
|_ output
|_ rcnn_2
|_ rcnn_bet365.pth
- siamese_pedia: inference script for siamese
|_ siamese_retrain: training script for siamese
|_ expand_targetlist
|_ 1&1 Ionos
|_ ...
|_ domain_map.pkl
|_ resnetv2_rgb_new.pth.tar
- main script for siamese
- evaluation script for general experiment
- tele: telegram scripts to vote for phishing
- config script for phish-discovery experiment
- main script for phish-discovery experiment
- CUDA 11
- Anaconda installed, please refer to the official installation guide:
- Create a local clone of Phishpedia
git clone
- Setup
cd Phishpedia/
chmod +x ./
If you encounter any problem in downloading the models, you can manually download them from here And put them into the corresponding conda environment.
conda activate myenv
Run in Python to test a single website
from phishpedia.phishpedia_main import test
import matplotlib.pyplot as plt
from phishpedia.phishpedia_config import load_config
url = open("phishpedia/datasets/test_sites/").read().strip()
screenshot_path = "phishpedia/datasets/test_sites/"
phish_category, pred_target, plotvis, siamese_conf, pred_boxes = test(url=url, screenshot_path=screenshot_path,
print('Phishing (1) or Benign (0) ?', phish_category)
print('What is its targeted brand if it is a phishing ?', pred_target)
print('What is the siamese matching confidence ?', siamese_conf)
print('Where is the predicted logo (in [x_min, y_min, x_max, y_max])?', pred_boxes)
plt.imshow(plotvis[:, :, ::-1])
plt.title("Predicted screenshot with annotations")
Or run in bash
python --folder <folder you want to test e.g. phishpedia/datasets/test_sites> --results <where you want to save the results e.g. test.txt>
- In our paper, we also implement several phishing detection and identification baselines, see here
- The logo targetlist described in our paper includes 181 brands, we have further expanded the targetlist to include 277 brands in this code repository
- For the phish discovery experiment, we obtain feed from Certstream phish_catcher, we lower the score threshold to be 40 to process more suspicious websites, readers can refer to their repo for details
- We use Scrapy for website crawling Repo here
If you find our work useful in your research, please consider citing our paper by:
title={Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages},
author={Lin, Yun and Liu, Ruofan and Divakaran, Dinil Mon and Ng, Jun Yang and Chan, Qing Zhou and Lu, Yiwen and Si, Yuxuan and Zhang, Fan and Dong, Jin Song},
booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
If you have any issues running our code, you can raise an issue or send an email to [email protected], [email protected], and [email protected]