Detection of formatting style from user's code using ML

Project made for the purpose of applying to JetBrains internship.
Reproduction of the "Learning to Format Coq Code Using Language Models" paper.

Date of creation: October, 2024

Quickstart

Clone the repository:

git clone https://github.com/AStroCvijo/detection_of_formatting_style.git

Download the math-comp dataset, extract it, and paste the folder into the detection_of_formatting_style\data directory.
Navigate to the project directory:
```
cd detection_of_formatting_style
```
Set up the environment:
```
source ./setup.sh
```
Train the model using the default settings:
```
python main.py
```

Arguments guide

Model arguments

-m or --model Followed by the model you want to use: LSTM, transformer, or n_gram
-hs or --hidden_dim Size of the models hidden layer -ed or --embedding_dim Size of the embedding space -nl or --num_layers Number of layers in the model

LSTM specific arguments

-bi or --bidirectional Will the LSTM be bidirectional

Transformer specific arguments

-nh or --number_heads Number of head of the Transformer model

Training arguments

-e or --epochs Number of epochs in training
-lr or --learning_rate Learning rate in training

Data arguments

-sl or --sequence_length Length of the sequences extracted from the data
-bs or --batch_size Batch size

How to Use

Training Example:

python main.py --train --model transformer --epochs 15 --learning_rate 0.00002 --sequence_length 6

Folder Tree

detection_of_formatting_style
├── data
│   ├── data_functions.py     # Contains functions for data preprocessing, loading, and transformation
│   └── math-comp             # The dataset folder
├── model
│   ├── transformer.py        # Transformer model implementation
│   ├── LSTM.py               # LSTM model implementation
│   ├── LSTMFS.py             # LSTM model implementation from scratch
│   └── n_gram.py             # n_gram model implementation
├── pretrained_models         # Directory for saving and loading pre-trained models
├── train
│   ├── eval.py               # Script to evaluate model performance
│   └── train.py              # Script to train and save models
├── utils
│   ├── argparser.py          # Contains argument parsing logic for CLI inputs
│   └── seed.py               # Contains functions for seed setting
├── README.md                 # README file
├── requirements.txt          # List of required dependencies
├── notebook.ipynb            # Notebook to test models
├── setup.sh                  # Bash script to set up the enviornment
└── main.py                   # Main script to run the project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detection of formatting style from user's code using ML

Quickstart

Arguments guide

Model arguments

LSTM specific arguments

Transformer specific arguments

Training arguments

Data arguments

How to Use

Training Example:

Folder Tree

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
model		model
pretrained_models		pretrained_models
train		train
utils		utils
README.md		README.md
main.py		main.py
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt
setup.sh		setup.sh

AStroCvijo/detection_of_formatting_style

Folders and files

Latest commit

History

Repository files navigation

Detection of formatting style from user's code using ML

Quickstart

Arguments guide

Model arguments

LSTM specific arguments

Transformer specific arguments

Training arguments

Data arguments

How to Use

Training Example:

Folder Tree

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages