The code is based on PyTorch and HuggingFace transformers
.
pip install -r requirements.txt
cd scripts
bash train.sh
Arguments explanation:
-
--dataset
: the name of datasets, just for notation -
--data_dir
: the path to the saved datasets folder, containingtrain.jsonl,test.jsonl,valid.jsonl
-
--seq_len
: the max length of sequence$z$ ($x\oplus y$ ) -
--resume_checkpoint
: if not none, restore this checkpoint and continue training -
--vocab
: the tokenizer is initialized using bert or load your own preprocessed vocab dictionary (e.g. using BPE) -
--learned_mean_embed
: set whether to use the learned soft absorbing state. -
--denoise
: set whether to add discrete noise -
--use_fp16
: set whether to use mixed precision training -
--denoise_rate
: set the denoise rate, with 0.5 as the default, no effect in this version
We provide our training weight on Google Drive.
Perform full 2000 steps diffusion process. Achieve higher performance compare with Speed-up Decoding
cd scripts
bash run_decode.sh
We customize the implementation of DPM-Solver++ to DiffuSeq to accelerate its sampling speed.
cd scripts
bash run_decode_solver.sh