The goal of this project is to build a model capable of identifying forged copies of already known signatures. The model will be built and trained using two different approaches: the Support Vector Machines (SVM) technique and a Convolutional Neural Network (CNN). In both cases, the dataset of choice will be the CEDAR signature dataset, a resource containing 2640 signatures from 55 people: 1320 real (24 each) and 1320 fake ones (again, 24 each). Even when genuine and from the same person, two or more signatures will never be exactly identical: pre-processing will be necessary. Each signatures's image will need to go through steps such as noise-removal, resizing, gray-scaling to make the process easier. Then, for the SVM approach features will be "manually" extracted and used to train the model, while for the CNN this will be done "inside" the Network. Finally, the testing will be carried out and the results of the two approaches will be compared.
The data used can be downloaded from the following links:
CEDAR dataset: the original image dataset, used to generate image pairs;
Image pairs: image pairs and labels, divided into X(pairs) and y(labels). Stored as .npy files;
HOG Distances: HOG distances and labels, computed from the image pairs sets and divided into train and test sets. Stored as .npy files. To be used to run the SVM locally on machines with low resources, since loading the pairs and computing the distances from scratch can take lots of RAM and time.
All the models and notebooks in this repository have been developed on Kaggle, links below:
make_pairs: utility notebook to create image pairs from the original CEDAR dataset. The results are saved as X.npy(pairs) and y.npy(labels);
SVM model: SVM model's notebook. Takes as input the image pairs created with make_pairs notebook and (optionally) saves the svm models;
Siamese Network: Siamese Network's notebook. Takes as input the image pairs created with make_pairs notebook and (optionally) saves the model.
Pre-trained models can be found in this Google Drive directory.