-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which site does the generated image correspond to and how to solve the repeated experiments? #2
Comments
We did not filter images based on their well or site information. Instead, we applied illumination correction to account for potential variations arising from such factors. Image selection was based on the specific perturbations associated with them, as detailed in the manuscript. For each perturbation, all corresponding perturbed images were included in the training process, addressing your second question. |
Thank you for your explanation. Could you remind me where the code for illumination correction is? Is illumination correction applied to all images? My previous question mentioned that a well has multiple sites for imaging. Suppose I have an image with two sites. When I test it, which site is the generated image compared to—the first site or the second site? |
Illumination correction was applied to all images. Different approaches were taken for illumination correction for different datasets based on provided information. For example for RxRx1 images, after converting to RGB format using the code mentioned above, images were normalized (channel-wise) as performed in train.py before feeding into the model. An example of how to convert 6 channels into RGB format is provided in https://colab.research.google.com/github/recursionpharma/rxrx1-utils/blob/trunk/notebooks/visualization.ipynb#scrollTo=p37r8ulsB1t4. A sample script for our project is also provided in https://github.com/bowang-lab/MorphoDiff/blob/main/morphodiff/preprocessing/rxrx1_convert_to_RGB.py. Regarding your last question, could you pls clarify what you mean by “site” (in RxRx1 for example)? |
Thanks for the clarification. The initial question wasn’t clear, as "site" can be interpreted differently depending on the dataset and context. As mentioned above, all images—including different sites associated with each (well and) perturbation—were included during training. MorphoDiff pipeline takes perturbation embedding as a conditional signal, and no site information is incorporated into the model, which means generated images will be associated to treatments only (and not site). For validation, different approaches can be taken. In our study, we compared the cohort of generated images for each perturbation against all real images for that perturbation (including all sites). The validation did not compare individual real and generated images one-to-one; instead, it assessed the distribution of real images versus generated images. Hope this clarifies further. Please let me know if there are any other questions. |
Thanks for the detailed explanation. So, if the validation is for Drug A, and I generated 100 images using Morphodiff, and if the real images were captured for two sites, then the validation would involve comparing the distribution of the generated 100 images with the real two images. Is that correct? Also, are the calculations of FID and KID metrics performed in the same manner? |
You are correct about comparing the generated images with all real images from all sites for that specific treatment. In our validation, we also augmented real images (using random flipping and rotation) to have the same number of images in generated cohort and real cohort for each treatment (100 in your example), and then performed image distance validation (including FID and KID metrics). |
i get it, thanks for your kind explanation! |
In Cell Painting, each well is imaged at different sites. Whether all site images are used for training or only specific sites are selected. If all sites are used for training, how can I determine which site’s image to compare with the generated image corresponding to a compound’s perturbation? Additionally, a compound may be tested multiple times. For repeated experiments of the same drug, are all of them used for training, or is only one selected for training?
The text was updated successfully, but these errors were encountered: