Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pocket representations have different dimensions for pair_repr #234

Open
yk-sun opened this issue Jun 20, 2024 · 9 comments
Open

pocket representations have different dimensions for pair_repr #234

yk-sun opened this issue Jun 20, 2024 · 9 comments

Comments

@yk-sun
Copy link

yk-sun commented Jun 20, 2024

Hello,

I tried to generate pocket representations of my own dataset using the code provided in the demo notebook. I got pair representations of different dimensions for different molecules, e.g. [n,n,64], where n is different for different molecules.

On the other hand, when I rerun the demo case, I got the same dimensions for all pair representations.

Could you please help to point out which steps I could have missed?

Thanks!

@ZhouGengmo
Copy link
Collaborator

Is it this pocket repr demo?
If it's convenient, could you provide your code modifications and example files?

@yk-sun
Copy link
Author

yk-sun commented Jun 27, 2024

Hello @ZhouGengmo

Thanks a lot for the reply.

Everything is fine when I run the unimol_pocket_repr_demo notebook with your provided data.

However, when I run the same code with my own input, I encountered the dimension problem. I suspect it was due to different pocket size, but your input also have difference pocket sizes, so there might be some pre-processing steps that I missed?

My input pdbs are 4jym and 5dj5 removing their bound ligands.

The pocket json is:
{"4jym":["A193","A194","A134","A218","A139","A142","A246","A26","A219","A124","A157","A95"],"5dj5":["A136","A141","A144","A148","A155","A27","A28","A159","A162","A191","A194","A195","A219","A220","A96","A97","A98","A247","A126"]}

The output dimensions I got for 'pair_repr' are: (111, 111, 64) and (172, 172, 64)

@ZhouGengmo
Copy link
Collaborator

However, when I run the same code with my own input, I encountered the dimension problem. I suspect it was due to different pocket size, but your input also have difference pocket sizes,

It is normal for the dimensions to differ. The dimensions of the representation are related to the pocket size. This is also reflected in the example data CASF2016, where not all data have the same dimensions. For instance:

  • PDB ID (in CASF2016): 3nq9, pair_repr_shape: (242, 242, 64)
  • PDB ID (in CASF2016): 5aba, pair_repr_shape: (206, 206, 64)
  • PDB ID (in CASF2016): 3g31, pair_repr_shape: (160, 160, 64)

It is also recommended to use unimol_tools, which are more user-friendly.

@yk-sun
Copy link
Author

yk-sun commented Jul 2, 2024

Thank you @ZhouGengmo ,

How would you recommend to treat these representations of different dimensions for comparison?

@ZhouGengmo
Copy link
Collaborator

Recommend using the CLS representation to represent the entire pocket. The CLS representations of different pockets have the same dimensions, i.e., mol_repr_cls here.

@yk-sun
Copy link
Author

yk-sun commented Jul 3, 2024

Hi @ZhouGengmo ,

There doesn't seem to be a pocket representation implementation yet in unimol_tools?

In the meantime, if I continue to use the notebook implementation mentioned above, would 'mol_repr' which provides (512,) dimension output for all pockets the same as "mol_repr_cls"?

If this is the case, does that mean it should be "molecular representation" (mol_repr or mol_repr_cls) annotated in your figure below, instead of "atom representation" for pockets?

If not, could you please help elaborate on the different representation outputs for pockets?

image

@ZhouGengmo
Copy link
Collaborator

In the meantime, if I continue to use the notebook implementation mentioned above, would 'mol_repr' which provides (512,) dimension output for all pockets the same as "mol_repr_cls"?

Yes, in this demo, mol_repr and mol_repr_cls are the same.

If this is the case, does that mean it should be "molecular representation" (mol_repr or mol_repr_cls) annotated in your figure below, instead of "atom representation" for pockets?

This figure is ok. CLS is a special token added before all atoms and is used to represent the whole molecule/pocket. The atom-level representation of Uni-Mol is not included in this demo.

Do you want to use the atom representation? I will add it to this demo ASAP.

@yk-sun
Copy link
Author

yk-sun commented Jul 6, 2024

Adding the atom representation would be great, Thank you!

@ZhouGengmo
Copy link
Collaborator

Added in this pr #247. You can pull the latest code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants