Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question Regarding Data Preprocessing for train.lmdb and valid.lmdb from MOAD in docking_v2 Directory #259

Open
iceissey opened this issue Aug 15, 2024 · 1 comment

Comments

@iceissey
Copy link

iceissey commented Aug 15, 2024

I would like to know how the train.lmdb and valid.lmdb files in the docking_v2/protein_ligand_binding_pose_prediction_v2 directory were processed from the MOAD dataset. I have checked the code and found only the data preprocessing during inference, which generates conformations for ligand molecules. Could you please clarify if the preprocessing during training is the same as during inference?

@iceissey iceissey changed the title Question Regarding Data Preprocessing for train.lmdb and test.lmdb from MOAD in docking_v2 Directory Question Regarding Data Preprocessing for train.lmdb and valid.lmdb from MOAD in docking_v2 Directory Aug 15, 2024
@ZhouGengmo
Copy link
Collaborator

The data preprocessing during training is consistent with that during inference, except that the conformation clustering is fixed during training, with M = 100 and N = 10.

Additionally, there are some duplicate data in the original training data and we do not apply special handling for this. We will release the processed LMDB files later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants