Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which site does the generated image correspond to and how to solve the repeated experiments? #2

Closed
prsigma opened this issue Feb 8, 2025 · 8 comments

Comments

@prsigma
Copy link

prsigma commented Feb 8, 2025

In Cell Painting, each well is imaged at different sites. Whether all site images are used for training or only specific sites are selected. If all sites are used for training, how can I determine which site’s image to compare with the generated image corresponding to a compound’s perturbation? Additionally, a compound may be tested multiple times. For repeated experiments of the same drug, are all of them used for training, or is only one selected for training?

@prsigma prsigma changed the title Which site does the generated image correspond to and how to solve ther repeated experiments? Which site does the generated image correspond to and how to solve the repeated experiments? Feb 8, 2025
@znavidi
Copy link
Member

znavidi commented Feb 11, 2025

We did not filter images based on their well or site information. Instead, we applied illumination correction to account for potential variations arising from such factors. Image selection was based on the specific perturbations associated with them, as detailed in the manuscript. For each perturbation, all corresponding perturbed images were included in the training process, addressing your second question.

@prsigma
Copy link
Author

prsigma commented Feb 11, 2025

We did not filter images based on their well or site information. Instead, we applied illumination correction to account for potential variations arising from such factors. Image selection was based on the specific perturbations associated with them, as detailed in the manuscript. For each perturbation, all corresponding perturbed images were included in the training process, addressing your second question.我们没有根据井或站点信息过滤图像。相反,我们应用了光照校正以应对这些因素可能导致的潜在变化。图像选择是基于它们特定的扰动,详见手稿。对于每个扰动,所有相应的扰动图像都包含在训练过程中,解决了您提出的第二个问题。

Thank you for your explanation. Could you remind me where the code for illumination correction is? Is illumination correction applied to all images? My previous question mentioned that a well has multiple sites for imaging. Suppose I have an image with two sites. When I test it, which site is the generated image compared to—the first site or the second site?

@znavidi
Copy link
Member

znavidi commented Feb 11, 2025

Illumination correction was applied to all images. Different approaches were taken for illumination correction for different datasets based on provided information. For example for RxRx1 images, after converting to RGB format using the code mentioned above, images were normalized (channel-wise) as performed in train.py before feeding into the model. An example of how to convert 6 channels into RGB format is provided in https://colab.research.google.com/github/recursionpharma/rxrx1-utils/blob/trunk/notebooks/visualization.ipynb#scrollTo=p37r8ulsB1t4. A sample script for our project is also provided in https://github.com/bowang-lab/MorphoDiff/blob/main/morphodiff/preprocessing/rxrx1_convert_to_RGB.py.

Regarding your last question, could you pls clarify what you mean by “site” (in RxRx1 for example)?

@prsigma
Copy link
Author

prsigma commented Feb 12, 2025

Image
The original paper mentions imaging different sites, and in fact, the provided dataset also contains information on sites.

Image

Here, s1 and s2 represent site1 and site2, respectively. So, if you are training with all the images, how do you know which site the generated images belong to? no offense,it seems that you are not very familiar with the dataset?

@znavidi
Copy link
Member

znavidi commented Feb 12, 2025

Thanks for the clarification. The initial question wasn’t clear, as "site" can be interpreted differently depending on the dataset and context. As mentioned above, all images—including different sites associated with each (well and) perturbation—were included during training. MorphoDiff pipeline takes perturbation embedding as a conditional signal, and no site information is incorporated into the model, which means generated images will be associated to treatments only (and not site).

For validation, different approaches can be taken. In our study, we compared the cohort of generated images for each perturbation against all real images for that perturbation (including all sites). The validation did not compare individual real and generated images one-to-one; instead, it assessed the distribution of real images versus generated images. Hope this clarifies further.

Please let me know if there are any other questions.

@prsigma
Copy link
Author

prsigma commented Feb 14, 2025

Thanks for the clarification. The initial question wasn’t clear, as "site" can be interpreted differently depending on the dataset and context. As mentioned above, all images—including different sites associated with each (well and) perturbation—were included during training. MorphoDiff pipeline takes perturbation embedding as a conditional signal, and no site information is incorporated into the model, which means generated images will be associated to treatments only (and not site).

For validation, different approaches can be taken. In our study, we compared the cohort of generated images for each perturbation against all real images for that perturbation (including all sites). The validation did not compare individual real and generated images one-to-one; instead, it assessed the distribution of real images versus generated images. Hope this clarifies further.

Please let me know if there are any other questions.

Thanks for the detailed explanation. So, if the validation is for Drug A, and I generated 100 images using Morphodiff, and if the real images were captured for two sites, then the validation would involve comparing the distribution of the generated 100 images with the real two images. Is that correct? Also, are the calculations of FID and KID metrics performed in the same manner?

@znavidi
Copy link
Member

znavidi commented Feb 15, 2025

You are correct about comparing the generated images with all real images from all sites for that specific treatment. In our validation, we also augmented real images (using random flipping and rotation) to have the same number of images in generated cohort and real cohort for each treatment (100 in your example), and then performed image distance validation (including FID and KID metrics).

@prsigma
Copy link
Author

prsigma commented Feb 15, 2025

You are correct about comparing the generated images with all real images from all sites for that specific treatment. In our validation, we also augmented real images (using random flipping and rotation) to have the same number of images in generated cohort and real cohort for each treatment (100 in your example), and then performed image distance validation (including FID and KID metrics).

i get it, thanks for your kind explanation!

@prsigma prsigma closed this as completed Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants