Skip to content

Gp2 Developability Modeling, Hackel Lab Univ. of Minnesota

License

Notifications You must be signed in to change notification settings

HackelLab-UMN/DevRep

Repository files navigation

DevRep

Gp2 Developability Modeling, Lead Author: Alexander Golinski

Contact Information [email protected]

Hackel & Martiniani Labs University of Minnesota

This project contains Python3 scripts used to predict the yield of Gp2 paratope variants utilizing high-throughput developability assays.

There are brief examples of how to use the code for the most predictive models for each aim of the project in the main directory under main*example.py. To run, unzip datasets located at ./datasets/ and ./datasets/predicted/. These scripts should run without without any other modifications. Required packages are included in the conda_package_list.txt.

The additional code for aim one of the project, determining the most predictive HT assays, can be found in ./main_paper_one/

The additional code for aim two of the project, creating a sequence-based model to predict yield via transfer learning of DevRep, can be found in ./main_paper_two/

Files in ./main_paper_*/ folders need to me moved to the main directory to run.

For non-top performing models, saved hyperparameter trials and model stats can be found within the zipped folder in the repective folders.

File descriptions: model_module.py - base model class that defines how to cross-validate, test, and evaluate model performances. submodels_module.py - subclasses that modify the model inputs/outputs and datasets for model evaluation. model_architectures.py - describes the hyperparameters and construction of the possible model architectures. plot_model.py - helper class to plot the predicted results from cv and testing. load_format_data.py - helper functions to format the data from the pickeled DataFrames to useful inputs for model evaluations.

Folder descriptions: /datasets/ - location of saved sequences' yields and assay scores. Due to GitHub size limits, you will have to unzip the datasets and the example predicted datasets Datasets are a pickeled DataFrame, which can be opened via panda.read_pickle() CSV examples of the smaller datasets are also included.

/trials/ - location of hyperparameter trials during cross-validation saved as pickeled hyperopt files. /model_stats/ - location of the best cv- and test- performance of the models /models/ - location of saved models as either pickeled scikit-learn models or tensorflow2 weights /plotpairs/ - location of saved pairs of (predicted value, true value, strain or assay id) /figures/ - location of the saved predicted figures for cv and testing

About

Gp2 Developability Modeling, Hackel Lab Univ. of Minnesota

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published