Which site does the generated image correspond to and how to solve the repeated experiments? #2

prsigma · 2025-02-08T12:19:00Z

In Cell Painting, each well is imaged at different sites. Whether all site images are used for training or only specific sites are selected. If all sites are used for training, how can I determine which site’s image to compare with the generated image corresponding to a compound’s perturbation? Additionally, a compound may be tested multiple times. For repeated experiments of the same drug, are all of them used for training, or is only one selected for training?

znavidi · 2025-02-11T02:07:46Z

We did not filter images based on their well or site information. Instead, we applied illumination correction to account for potential variations arising from such factors. Image selection was based on the specific perturbations associated with them, as detailed in the manuscript. For each perturbation, all corresponding perturbed images were included in the training process, addressing your second question.

prsigma · 2025-02-11T02:16:35Z

We did not filter images based on their well or site information. Instead, we applied illumination correction to account for potential variations arising from such factors. Image selection was based on the specific perturbations associated with them, as detailed in the manuscript. For each perturbation, all corresponding perturbed images were included in the training process, addressing your second question.我们没有根据井或站点信息过滤图像。相反，我们应用了光照校正以应对这些因素可能导致的潜在变化。图像选择是基于它们特定的扰动，详见手稿。对于每个扰动，所有相应的扰动图像都包含在训练过程中，解决了您提出的第二个问题。

Thank you for your explanation. Could you remind me where the code for illumination correction is? Is illumination correction applied to all images? My previous question mentioned that a well has multiple sites for imaging. Suppose I have an image with two sites. When I test it, which site is the generated image compared to—the first site or the second site?

znavidi · 2025-02-11T23:30:57Z

Illumination correction was applied to all images. Different approaches were taken for illumination correction for different datasets based on provided information. For example for RxRx1 images, after converting to RGB format using the code mentioned above, images were normalized (channel-wise) as performed in train.py before feeding into the model. An example of how to convert 6 channels into RGB format is provided in https://colab.research.google.com/github/recursionpharma/rxrx1-utils/blob/trunk/notebooks/visualization.ipynb#scrollTo=p37r8ulsB1t4. A sample script for our project is also provided in https://github.com/bowang-lab/MorphoDiff/blob/main/morphodiff/preprocessing/rxrx1_convert_to_RGB.py.

Regarding your last question, could you pls clarify what you mean by “site” (in RxRx1 for example)?

prsigma · 2025-02-12T01:42:36Z

The original paper mentions imaging different sites, and in fact, the provided dataset also contains information on sites.

Here, s1 and s2 represent site1 and site2, respectively. So, if you are training with all the images, how do you know which site the generated images belong to? no offense，it seems that you are not very familiar with the dataset？

znavidi · 2025-02-12T02:39:15Z

Thanks for the clarification. The initial question wasn’t clear, as "site" can be interpreted differently depending on the dataset and context. As mentioned above, all images—including different sites associated with each (well and) perturbation—were included during training. MorphoDiff pipeline takes perturbation embedding as a conditional signal, and no site information is incorporated into the model, which means generated images will be associated to treatments only (and not site).

For validation, different approaches can be taken. In our study, we compared the cohort of generated images for each perturbation against all real images for that perturbation (including all sites). The validation did not compare individual real and generated images one-to-one; instead, it assessed the distribution of real images versus generated images. Hope this clarifies further.

Please let me know if there are any other questions.

prsigma · 2025-02-14T06:04:39Z

Thanks for the clarification. The initial question wasn’t clear, as "site" can be interpreted differently depending on the dataset and context. As mentioned above, all images—including different sites associated with each (well and) perturbation—were included during training. MorphoDiff pipeline takes perturbation embedding as a conditional signal, and no site information is incorporated into the model, which means generated images will be associated to treatments only (and not site).

For validation, different approaches can be taken. In our study, we compared the cohort of generated images for each perturbation against all real images for that perturbation (including all sites). The validation did not compare individual real and generated images one-to-one; instead, it assessed the distribution of real images versus generated images. Hope this clarifies further.

Please let me know if there are any other questions.

Thanks for the detailed explanation. So, if the validation is for Drug A, and I generated 100 images using Morphodiff, and if the real images were captured for two sites, then the validation would involve comparing the distribution of the generated 100 images with the real two images. Is that correct? Also, are the calculations of FID and KID metrics performed in the same manner?

znavidi · 2025-02-15T02:08:45Z

You are correct about comparing the generated images with all real images from all sites for that specific treatment. In our validation, we also augmented real images (using random flipping and rotation) to have the same number of images in generated cohort and real cohort for each treatment (100 in your example), and then performed image distance validation (including FID and KID metrics).

prsigma · 2025-02-15T06:22:27Z

You are correct about comparing the generated images with all real images from all sites for that specific treatment. In our validation, we also augmented real images (using random flipping and rotation) to have the same number of images in generated cohort and real cohort for each treatment (100 in your example), and then performed image distance validation (including FID and KID metrics).

i get it， thanks for your kind explanation！

prsigma changed the title ~~Which site does the generated image correspond to and how to solve ther repeated experiments?~~ Which site does the generated image correspond to and how to solve the repeated experiments? Feb 8, 2025

prsigma closed this as completed Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which site does the generated image correspond to and how to solve the repeated experiments? #2

Which site does the generated image correspond to and how to solve the repeated experiments? #2

prsigma commented Feb 8, 2025

znavidi commented Feb 11, 2025

prsigma commented Feb 11, 2025

znavidi commented Feb 11, 2025

prsigma commented Feb 12, 2025

znavidi commented Feb 12, 2025

prsigma commented Feb 14, 2025

znavidi commented Feb 15, 2025 •

edited

Loading

prsigma commented Feb 15, 2025

Which site does the generated image correspond to and how to solve the repeated experiments? #2

Which site does the generated image correspond to and how to solve the repeated experiments? #2

Comments

prsigma commented Feb 8, 2025

znavidi commented Feb 11, 2025

prsigma commented Feb 11, 2025

znavidi commented Feb 11, 2025

prsigma commented Feb 12, 2025

znavidi commented Feb 12, 2025

prsigma commented Feb 14, 2025

znavidi commented Feb 15, 2025 • edited Loading

prsigma commented Feb 15, 2025

znavidi commented Feb 15, 2025 •

edited

Loading