Project made for the purpose of applying to JetBrains internship.
Reproduction of the "Learning to Format Coq Code Using Language Models" paper.
Date of creation: October, 2024
-
Clone the repository:
git clone https://github.com/AStroCvijo/detection_of_formatting_style.git
-
Download the math-comp dataset, extract it, and paste the folder into the
detection_of_formatting_style\data
directory. -
Navigate to the project directory:
cd detection_of_formatting_style
-
Set up the environment:
source ./setup.sh
-
Train the model using the default settings:
python main.py
-m or --model
Followed by the model you want to use: LSTM
, transformer
, or n_gram
-hs or --hidden_dim
Size of the models hidden layer
-ed or --embedding_dim
Size of the embedding space
-nl or --num_layers
Number of layers in the model
-bi or --bidirectional
Will the LSTM be bidirectional
-nh or --number_heads
Number of head of the Transformer model
-e or --epochs
Number of epochs in training
-lr or --learning_rate
Learning rate in training
-sl or --sequence_length
Length of the sequences extracted from the data
-bs or --batch_size
Batch size
python main.py --train --model transformer --epochs 15 --learning_rate 0.00002 --sequence_length 6
detection_of_formatting_style
├── data
│ ├── data_functions.py # Contains functions for data preprocessing, loading, and transformation
│ └── math-comp # The dataset folder
├── model
│ ├── transformer.py # Transformer model implementation
│ ├── LSTM.py # LSTM model implementation
│ ├── LSTMFS.py # LSTM model implementation from scratch
│ └── n_gram.py # n_gram model implementation
├── pretrained_models # Directory for saving and loading pre-trained models
├── train
│ ├── eval.py # Script to evaluate model performance
│ └── train.py # Script to train and save models
├── utils
│ ├── argparser.py # Contains argument parsing logic for CLI inputs
│ └── seed.py # Contains functions for seed setting
├── README.md # README file
├── requirements.txt # List of required dependencies
├── notebook.ipynb # Notebook to test models
├── setup.sh # Bash script to set up the enviornment
└── main.py # Main script to run the project