03. Model for Stain2 #5

EchteRobert · 2022-02-28T19:19:44Z

It is now clear that this feature aggregation model will only serve a certain feature set (meaning a certain dataset line), and is not developed to be able to aggregate any feature set (it is only invariant to the number of cells per well). I will start with creating a model that is able to beat the 'mean aggregation' baselines of the Stain2 batches, and then move forward to Stain3, Stain4, and finally use Stain5 as a final testset.

Because of that it would be ideal if all features across Stain datasets were the same. This is (somewhat) the case across Stain2, Stain3, and Stain4. However, Stain5 has a slightly different cellprofiler pipeline resulting in a different and larger feature set. During preprocessing I found that the pipeline from raw single-cell features to data that can directly be fed to the model, is quite a slow process. This is especially the case when all features are used (in this case 4295 for Stain 2-4 and 5794 for Stain 5). The model inference and training also becomes increasingly slower as the number of features increases. From the initial experiments on CPJUMP1 we saw that not all features are needed to create a better profile than the baseline (#1). This is why I have chosen to select only all common features across Stain 2-5. This has the advantage of speed, both in preprocessing and inference, and compatibility, as no separate model will have to be trained to use Stain5 as the test set.

Assuming that the features across Stain2, Stain3, Stain4, and Stain5 are consistent within each experiment, there are 1324 features which are measured in all of them. The features are well distributed in terms of category: Cells: 441 features, Cytoplasm: 433 features, and Nuclei: 450 features. 1124 of them are decently uncorrelated (<abs(0.5) Pearsson correlation) [one plate tested]. From hereon these are the features that will be used to train the model.

EchteRobert · 2022-02-28T19:23:26Z

The Stain 2 experiment (https://github.com/jump-cellpainting/pilot-analysis/issues/15) contains 14 batches, of which only 1 will not be used to train the model. This is BR00112200 (Confocal) which contains less features than the other batches due to it missing the RNA channel. All other batches will be used to train or validate the model. See overview below:

Beautiful colours here!

Note that the Percent Strong shown here is calculated with an additional sphering operation

The Percent Strong/Replicating with feature selected features - no sphering

Description	Percent_Replicating
BR00113818.csv	51.1
BR00113819.csv	51.1
BR00113821.csv	51.1
BR00113820.csv	56.7
BR00112198.csv	55.6
BR00112204.csv	63.3
BR00112199.csv	58.9
BR00112200.csv	63.3
BR00112201.csv	70
BR00112197repeat.csv	63.3
BR00112203.csv	52.2
BR00112202.csv	56.7
BR00112197binned.csv	58.9
BR00112197standard.csv	66.7

The Percent Strong/Replicating with the 1324 features as used by the model - I will use this as the reference BM

Description	Percent_Replicating
BR00113818.csv	52.2
BR00113819.csv	48.9
BR00113821.csv	47.8
BR00113820.csv	55.6
BR00112198.csv	56.7
BR00112204.csv	58.9
BR00112199.csv	57.8
BR00112201.csv	66.7
BR00112197repeat.csv	63.3
BR00112203.csv	56.7
BR00112202.csv	54.4
BR00112197binned.csv	58.9
BR00112197standard.csv	56.7

EchteRobert · 2022-02-28T19:45:28Z

Experiment 1

The first model is trained on BR00112197 binned, BR00112199 multiplane, and BR00112203 MitoCompare. These are the most distinct batches that could have been chosen, all other batches' features have values that contain more similar distributions. The training and validation loss curves indicate slow but steady learning and the model has not converged after 50 epochs. The PR will be calculated for each batch as a whole without the negative controls. The training data consists of 80% of each batch, meaning that the model has not seen the remaining 20% during training. The model will also be tested on a completely unseen batch.

Main Takeaways

The PR shows that the correlation between non-replicates is quite high, but the correlation between replicates is even higher. The model appears to cluster everything somewhat together, but still separates the replicates adequately. This might indicate that it does not even use the full latent feature space yet?
Robust MAD normalization pushes the non-replicates more around a zero distribution, however this is at the cost of the overall PR.
The model learns general aggregation methods, which also apply to a completely unseen batch: BR00113818 Redone.
Interestingly, the model performs slightly worse on the BR00112199 MultiPlane and BR00112197 binned batches, which it has partly seen during training, while it performs better on the BR00113818 Redone batch, which it has not yet seen before. The negative controls for these training plates have higher correlations than the BR00113818 Redone plate.

Conclusion

The model shows promise in learning general aggregation methods which can be applicable to unseen data, as long as the features remain constant. However, something unexpected is going on for the BR00112199 MultiPlane and BR00112197 binned batches. I will investigate whether these results are due to chance or something else is going on.

Results! Wooh!

BR00112203 MitoCompare - training data

BR00112203 MitoCompare Robust MAD normalized features

BR00112199 MultiPlane - training data

BR00112197 binned - training data

BR00113818 Redone - not in training set

EchteRobert · 2022-02-28T22:47:46Z

While trying to find the cause for the possible issue described in #5 (comment), I found that the model creates a feature space that puts features from the same batch closer together than the mean aggregation method does. Whether this is a good thing or not is not obvious to me. Note that BR00113818 is not in the training set of the MLP.

Look at these patterns!

EchteRobert · 2022-03-01T20:57:56Z

Experiment 1 (continued)

As the model improved the PS upon the baseline in all of the previous plates, I will now test the model on 5 for more plates from the Stain2 dataset: BR00113818_Redone, BR00113819_Redone, BR00113820_Redone, BR00113821_Redone, and BR00112197_repeat. The PR/PS is reported below. I also plotted the number of cells per well per plate in histograms.

Main takeaways

The model performs similar to or better than the average aggregation method for 3 out of 5 plates. For the remaining two it significantly underperformed however. I expected this to be due to the average number of cells that would be present in the plates. Looking at the histograms of these two plates (BR00113820_Redone and BR00113821_Redone), we can see that this might indeed be the cause as these two plates have a different distribution of cells per well and less cells overall.

Later addition: As discussed with @shntnu I calculated the PC1 loadings per plate and the correlation between these loadings. See below. It shows how especially BR00112203 (training), BR00113819, BR00113820, and BR00113821 do not correlate well with with the other plates in terms of PC1 loadings, i.e. other features are more important to describe the profiles of these plates. Note also that BR0011203, and BR00112199 are used as 2 of the 3 training plates, while these correlate especially less with the two poorly performing plates. Especially because the PR of the BR00112203 (training) is the highest, while its PC1 loadings correlation is relatively low with all other plates it is expected that the model performs worse on all other plates.

Conclusion: the plates used during training probably influence the model to pay more attention to a specific set of features, which are not as relevant for the poorly performing plates.

Are you ready for this?

BR00112197_repeat

BR00113818_Redone

BR00113819_Redone

BR00113820_Redone

BR00113821_Redone

Don't forget to look at these!

This is additional stuff. Perhaps not as interesting as the first bit? You decide.

PC1 loadings per plate

Number of cells per well per plate summary

niranjchandrasekaran · 2022-03-01T21:42:35Z

The model performs similar to or better than the average aggregation method for 3 out of 5 plates. For the remaining two it significantly underperformed however.

@EchteRobert Quick question - did you recompute Percent Replicating for the baseline using the 1324 features or are these values from the original baseline in https://github.com/jump-cellpainting/pilot-analysis/issues/15#issuecomment-670640802? If it is the latter, I would recommend doing the former so that we are comparing apples to apples.

Also, the cell count histograms surprised me. Given that the only difference between the plates is the dye concentration, I did not expect to see such a huge difference in the number of cells between plates.

EchteRobert · 2022-03-01T21:53:01Z

I did not @niranjchandrasekaran. Good point. I will recalculate the baseline with 1324 features.

Yes it also surprised me a bit, although I cannot explain why it would be the case. Actually, I encountered the first well in these two plates which did not contain any cells at all.

niranjchandrasekaran · 2022-03-01T22:02:56Z

On checking the table in #5 (comment), I just realized that the two plates BR00113820_Redone and BR00113821_Redone have different cell seeding density compared to the other plates. So they are expected to have different number of cells.

EchteRobert · 2022-03-02T17:05:14Z

Experiment (intermediate)

The previous results showed a high non-replicate correlation and, although the replicate correlation was even higher, we would rather like to see a lower non-replicate correlation which would represent a cleaner profile or sharper contrast between replicates and non-replicates.
To test this John proposed to change my current feature normalization method (zero-mean 1 standard deviation) to RobustMAD. Secondly, I doubled the batch size during training. This means that there are more negative pairs per batch (as this increases exponentially) which may push the learned profiles further apart.

Main takeaways

The increased batch size in combination with the RobustMAD normalization show that the model has an extremely hard time learning. Upon inspection of the gradients of the model, I saw that these vanished instantly with the first epochs. Returning to the original normalization removed this effect and allowed for better training.

Click here!

BR00112203 plate (previously highest PR)

EchteRobert · 2022-03-02T21:16:46Z

Experiment 2

As RobustMAD did not do what was expected and the non-replicate correlation did not decrease either, likely due to the model not learning at all, I trained another model with the previous normalization and a higher batch size (80 instead of 128 in the previous post). I also moved to 'cleaner' data (all 'green' plates as indicated in the table here #5 (comment)), which may cause the model to perform worse on the 'non-green' plates.

Main takeaways

The model is able to push non-replicate correlation down somewhat, however this comes at the cost of overfitting. The model achieves this on the training plates, but not on the validation plates. I expect that more data will be needed to achieve the best of both worlds.

Losses and PRs!

BR00112197 standard - training data

BR00113818 - non training data

EchteRobert · 2022-03-02T21:22:28Z

Experiment 3

In #5 (comment) I showed that the model learns to amplify the plate specific signal for the cell profiles. To counteract that a model is trained which also tries to learn across plate replicates. Additionally, one possible reason why the negative correlation has been so high so far, may be that the model learns to separate all plate information. By doing that the model automatically pushes all same plate profiles together and non-replicate profile correlation will become higher in general. Perhaps including across plate replicates will reduce this effect by fully utilizing the latent loss space.

Main takeaways

Non-replicate correlation appears to indeed decrease somewhat as expected, at least for the training plates. However, the model is overfitting very clearly and the overall performance with respect to the previous model is much lower. Decreasing the batch size and increasing the number of plates used for training does not solve this problem. I expect that the model is memorizing specific compounds, but not an aggregation method.

UMAP patterns here!

UMAP BM same plates as in #5 (comment)

UMAP MLP

UMAP BM training plates
['BR00112197standard': 0, 'BR00112199': 1, 'BR00112197repeat': 2]

UMAP MLP training plates

Percent histograms here!

Training plates

Test plate

shntnu · 2022-03-04T01:20:28Z

As discussed with @shntnu I calculated the PC1 loadings per plate and the correlation between these loadings.

@EchteRobert Awesome! What you essentially did here was measure the distribution similarity between all pairs of plates. The first PC is a quick way to do that.

Comparing the PC1 loadings of two multivariate distributions is a shortcut for comparing the covariance matrices of the two multivariate distributions. If the distributions are truly multivariate gaussian (good luck with that, haha!), then it's actually a very good approximation (to the extent that PC1 explains a large fraction of the variance).

If you really want to go down this rabbit hole (⚠️ stop, don't ! ⚠️ ) read up

EchteRobert · 2022-03-08T17:09:22Z

Experiment 3V2

Learning from previous experiments, I used the following experiment setup:

Use 5 plates as training/validation data, which have the lowest correlation with other plates based on the PC1 loadings shown in 03. Model for Stain2 #5 (comment). These are: ['BR00112197binned_FS', 'BR00112199_FS', 'BR00112203_FS', 'BR00113818_FS', 'BR00113820_FS']
Replicates are only considered within wells, as across wells lead to poor performance (and perhaps is also not a sensible training method given the evaluation method/goal for the model).
A larger batch size of 72 was used, to increase the number of negative pairs per batch.
500 cells were consistently sampled 3 times per well per batch (this is no different than other experiments, but it may change in future ones so I'm pointing it out here).
The cosine similarity distance metric is used instead of SNR distance to ensure that hard positive mining is performed during the SupCon loss calculation. We will see that this also changes the loss features space for the better.
The number of parameters in the model is increased fourfold to decrease underfitting of the model.

Below I will show:

The PC1 loadings of the model aggregated cell profiles
The PR of all 13 plates in Stain2
The (mean) mean average precision (mAP)of the training and validation compounds for the benchmark (mean aggregation) and the model. For the worst performing ones I will show the mAP per compound for that validation set.

Main takeaways

The PC1 loadings of the model features are similar to those of the BM for most plates, however the BR00113820 and BR00113821 plates are even stronger outliers now and the BR00112203 is a much smaller outlier.
The model achieves higher PR scores than the baseline for all plates now.
The PR distribution of the non-replicates is centered more around zero, due to the switch to the cosine similarity (which is normalized).
The model has overfit the training set by quite a lot, which can be seen by the PR scores as well as the mAP scores.
The mAP of the MLP training compounds is higher than that of the BM training compounds, while the mAP of the MLP validation compounds is generally higher than that of the BM validation compounds. This observation shows the potential of the model to generalize to unseen compounds. Perhaps with some form of regularization the generalization of the model to unseen compound types can be increased.

PC1 loadings of the model profiles

PR but in a new latent loss space!

Plate	Percent Replicating
Training
BR00112197binned	88.9
BR00112199	91.1
BR00112203	88.9
BR00113818	84.4
BR00113820	97.8
Validation
BR00112197repeat	72.2
BR00112197standard	72.2
BR00112198	63.3
BR00112201	72.2
BR00112202	56.7
BR00112204	61.1
BR00113819	67.8
BR00113821	50.0

A new metric approaches!

5 plates are used to train the model (as shown in the 'Plate' column). During training 80% of the compounds are used to train the model and 20% of the compounds (the same ones for each plate) are used as a hold out or validation set.

Plate	training compounds MLP	training compounds BM	validation compounds MLP	validation compounds BM
Training
BR00112197binned	0.44	0.41	0.20	0.30
BR00112199	0.38	0.32	0.20	0.28
BR00112203	0.49	0.30	0.16	0.27
BR00113818	0.43	0.28	0.17	0.30
BR00113820	0.59	0.30	0.18	0.30
Validation
BR00112197repeat	0.29	0.41	0.25	0.31
BR00112197standard	0.32	0.40	0.27	0.28
BR00112198	0.27	0.35	0.26	0.30
BR00112201	0.26	0.40	0.22	0.32
BR00112202	0.25	0.34	0.24	0.30
BR00112204	0.24	0.35	0.25	0.29
BR00113819	0.24	0.28	0.17	0.25
BR00113821	0.19	0.24	0.12	0.22

mAP BR00112201

Plate: BR00112201
Total mean:0.25251311463707016

Training samples mean AP: 0.259931

compound	AP
PF-477736	1
AMG900	1
APY0201	1
AZD2014	1
GDC-0879	1
acriflavine	1
RG7112	0.930556
GSK-J4	0.897222
Compound2	0.830556
BLU9931	0.677167
BI-78D3	0.668651
SCH-900776	0.640873
CPI-0610	0.572222
SU3327	0.510317
ABT-737	0.480423
Compound7	0.472073
-GNF 5	0.469444
MK-5108	0.447917
THZ1	0.422808
NVS-PAK1-1	0.347374
SU-11274	0.32939
GW-5074	0.246392
GSK2334470	0.246166
BX-912	0.24095
NVP-AEW541	0.23775
CHIR-99021	0.220037
dosulepin	0.202143
GSK-3-inhibitor-IX	0.172313
PD-198306	0.148742
PFI-1	0.14835
Compound3	0.145067
BMS-566419	0.12329
BMS-863233	0.121743
apratastat	0.118872
WZ4003	0.114163
ICG-001	0.11288
PNU-74654	0.0874405
ML324	0.0822136
Compound5	0.0819586
GW-3965	0.0698881
SGX523	0.0628168
AZ191	0.0614712
A-366	0.0492269
halopemide	0.0481211
FR-180204	0.0474747
BIX-02188	0.044098
Compound4	0.0427142
AZD7545	0.0417633
SHP 99.00	0.0412191
RGFP966	0.0397035
IOX2	0.0396046
CP-724714	0.0378228
EPZ015666	0.037468
AMG-925	0.0353015
VX-745	0.0336891
SGC-707	0.0329782
P5091	0.0326774
Compound6	0.0305971
delta-Tocotrienol	0.0295755
Compound1	0.0279454
PS178990	0.0278597
carmustine	0.0272295
T-0901317	0.0272058
andarine	0.0257093
UNC0642	0.0257052
dimethindene-(S)-(+)	0.0252354
ML-323	0.0244636
ML-298	0.0232809
Compound8	0.0218036
SAG	0.0198054
KH-CB19	0.0187536
filgotinib	0.0143387

Validation samples mean AP: 0.222843

compound	AP
valrubicin	0.830159
sirolimus	0.647222
romidepsin	0.614379
ponatinib	0.489386
merimepodib	0.373039
ispinesib	0.357657
neratinib	0.250216
veliparib	0.0939503
orphenadrine	0.0710256
ruxolitinib	0.0683867
hydroxyzine	0.0374705
selumetinib	0.0353887
pomalidomide	0.0339397
skepinone-l	0.0242614
homochlorcyclizine	0.0220177
rheochrysidin	0.0216262
quazinone	0.0209096
purmorphamine	0.0201343

EchteRobert · 2022-03-11T23:00:48Z

To get an overview of all the PRs based on training/validation plates and training/validation compounds like for the mAP.
Generally speaking, the PR values correlate highly with the mAP values that were reported in #5 (comment).

Excel table

EchteRobert · 2022-03-17T20:35:15Z

Experiments

The model showed in previous comments is overfitting the training dataset. This means it does not beat the baseline in mean average precision when comparing its profiles created for validation (hold-out) compounds, validation (hold-out) plates, or both.
There are two main ideas to reduce overfitting on 1. plates and 2. compounds:

Consider replicates across plates
Aggregate all same-compound cells from wells within a plate, into a super well if you will, and then sampling new 'augmented wells' from this super well. This should increase the variability of single-cell well compositions and reduce compound overfitting.
(3. A possible extension of 1. and 2. is to also merge ALL compound wells across ALL plates (to form super super wells?))

Main takeaways

I will not show the results as there are too many different experiments, but instead outline the most important findings.

Using across plate replicates did not result in higher performance (PR/mAP) on validation plates. Results are instead somewhat worse to previous models. I expect this to be due to the training task (finding a latent space representation that attracts/repels across plate replicates) differing too much from the evaluation task (checking if these latent space representations attracts/repels within plate replicates).
Aggregating same-compound wells has a strong regularizing effect on the model performance: training plates now achieve similar performance to validation plates, but also no longer beat the baseline performance.
Training and Validation loss decrease together nicely (no more overfitting) when across plate replicates are no longer considered, but creating super wells still is.
However, it turns out that by creating super wells and sampling augmented wells from these the model learns something very different from what the evaluation task is. What it learns is is not exactly clear, but I think that because samples are now all created from a similar (aggregated) distribution, and thus contain cells which originated from the same well, they are much much easier to distinguish. Basically, it is matching cells with the same feature profiles which originated from the same well instead of finding a good aggregation method for the entire well.

Next up

A possible improvement will be to reduce the data augmentation a bit. Instead, only creating super wells 50% of the time. The other 50% sampling will be done from a single well. Additionally, super wells are created by aggregating only 2 of the 4 available wells (chosen at random).
Another improvement is the normalization method. I will now normalize all wells across the entire plate before training the model on the wells. First this normalization was done per well.

EchteRobert · 2022-03-18T20:32:10Z

Experiment

Results of the 'Next up' experiment described here: #5 (comment)

Main takeaways

The model is now able to also beat the mAP of the validation compounds in the training plates
It also beats the mAP of both the training and the validation compounds in some of the validation plates. It was not able to do either before.
The 4 plates where did not outperform the BM in any of the metrics are the furthest away from the training plates (see PC1 loadings plot), so this is also an expected result.

Next up

It's possible that a separate model is need for the plates where the model did not perform as well yet. I will try training a separate model for those plates next.
I will also try training a model with across replicate correlations again, to see if it does improve generalization using this new training setup.

EXCITING!

Results in bold are the highest score

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00112201	0.66	0.40	0.43	0.32	98.9	66.7
BR00112198	0.56	0.35	0.4	0.30	100	56.7
BR00112204	0.59	0.35	0.35	0.29	100	58.9
Validation plates
BR00112202	0.44	0.34	0.31	0.30	93.3	54.4
BR00112197standard	0.47	0.40	0.34	0.28	94.4	56.7
BR00112203	0.19	0.30	0.21	0.27	52.2	56.7
BR00112199	0.3	0.32	0.23	0.28	76.7	57.8
BR00113818	0.32	0.28	0.24	0.30	77.8	52.2
BR00113819	0.32	0.28	0.21	0.25	70	48.9
BR00112197repeat	0.47	0.41	0.37	0.31	92.2	63.3
BR00113820	0.27	0.30	0.24	0.30	58.9	55.6
BR00113821	0.15	0.24	0.16	0.22	38.9	47.8
BR00112197binned	0.41	0.41	0.34	0.30	91.1	58.9

shntnu · 2022-03-18T22:20:10Z

👀 🎊

EchteRobert · 2022-03-21T19:02:00Z

Experiment

Building upon the setup in the previous experiment I now train and evaluate a model on across plate compound replicates. The training set consists of the same 3 plates: BR00112201, BR00112198, and BR00112204. The validation set contains only the BR00112202, BR00112197standard, BR00113818, BR00113819, BR00112197repeat, and BR00112197binned. Note that I am only selecting the plates here that are close to the training sets, this is because I am considering across plate correlations and the other 4 outlier plates look at different features. I group the outlier plates in a separate validation set and compute the results for this set for completeness sake, but I do not think this last set is useful for analysis due to their different feature importances.

I compute the baseline mAP (and PR) using the mean aggregation method for these two sets with across plate replicates of compounds, and do the same using the model aggregation method.

Main takeaways

The model achieves better mAP scores than the baseline method in matching compounds across plates in both the training and validation set.
The model achieves worse mAP scores than the baseline method in matching compounds across plates in the outlier set. This is expected.
The model achieves generally lower mAP scores on finding within plate replicates than the previous model (03. Model for Stain2 #5 (comment)). It also beats the baseline mean aggregation less often in validation plates. This seems like a logical consequence of requiring the model to adjust for various staining concentrations.

Next up

It's possible that a separate model is need for the outlier plates. I will try training a separate model for those plates next. I am curious to see if this model in turn will perform poorly on the training and validation plates used in this experiment.

CrissCross mAP🔀

Across plate compound correlations
-- I do not report the PR, because all of these are (close to) 100 percent. I expect this to be due to the high number of replicates that are now being considered (perhaps I need to increase the number of samples used for the non-replicate correlation calculation?). --

plate set	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM
Training set	0.48	0.30	0.35	0.30
Validation set	0.31	0.23	0.28	0.21
Outlier set	0.11	0.15	0.09	0.13

Within plate compound correlations

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00112201	0.58	0.4	0.37	0.32	98.9	66.7
BR00112198	0.53	0.35	0.34	0.3	97.8	56.7
BR00112204	0.53	0.35	0.35	0.29	98.9	58.9
Validation plates
BR00112202	0.43	0.34	0.36	0.3	88.9	54.4
BR00112197standard	0.46	0.4	0.39	0.28	92.2	56.7
BR00112203	0.18	0.3	0.16	0.27	48.9	56.7
BR00112199	0.28	0.32	0.18	0.28	68.9	57.8
BR00113818	0.26	0.28	0.26	0.3	70	52.2
BR00113819	0.25	0.28	0.19	0.25	72.2	48.9
BR00112197repeat	0.44	0.41	0.36	0.31	86.7	63.3
BR00113820	0.25	0.3	0.2	0.3	64.4	55.6
BR00113821	0.17	0.24	0.18	0.22	45.6	47.8
BR00112197binned	0.41	0.41	0.4	0.3	88.9	58.9

EchteRobert · 2022-03-22T13:49:04Z

Experiment

To see if my hypothesis* is true, I trained a model on 2 of the outlier plates (BR00113819 and BR00113821). I then calculated the same performance metrics as before. The model was trained without creating pairs across plates, only within each plate.

*Training on plates which are similar according to the PC1 loadings plot, will lead to poor performance of the model on plates which are dissimilar to the training plates.

Main takeaways

I expected the model to beat the baseline for plates BR00113818 and BR00113820, and although it did not perform very poorly on these plates, it did not beat the baseline in all metrics.
In fact, only for these two validation plates did the model prediction outperform the baseline for the training compounds, while performing worse on the validation compounds. The opposite is true for BR00112202, BR00112197standard, BR00112197repeat, BR00112204, and BR00112201. So it appears the model has overfit the training compounds for the plates that are similar to the training plates, but still learned a decent aggregation of the validation compounds for the validation plates.
None of the model predictions for the validation plates achieved better performance than the baseline in all metrics. This may be due to the larger differences between the training plates and validation plates used in this experiment than in the previous experiment.

Next up

Time to evaluate on Stain3.

TableTime!

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00113819	0.58	0.28	0.28	0.25	97.8	48.9
BR00113821	0.59	0.24	0.22	0.22	96.7	47.8
Validation plates
BR00112202	0.33	0.34	0.34	0.3	80	54.4
BR00112197standard	0.32	0.4	0.34	0.28	78.9	56.7
BR00112203	0.16	0.3	0.18	0.27	38.9	56.7
BR00112199	0.17	0.32	0.16	0.28	40	57.8
BR00113818	0.35	0.28	0.24	0.3	76.7	52.2
BR00112198	0.27	0.35	0.28	0.3	66.7	56.7
BR00112197repeat	0.33	0.41	0.34	0.31	70	63.3
BR00112204	0.28	0.35	0.35	0.29	66.7	58.9
BR00113820	0.36	0.3	0.25	0.3	84.4	55.6
BR00112197binned	0.28	0.41	0.3	0.3	65.6	58.9
BR00112201	0.38	0.4	0.34	0.32	86.7	66.7

EchteRobert · 2022-03-31T20:49:49Z

Evaluation

As an additional evaluation at the compound level, I compared the mAP between the model and the benchmark for the 'within cluster plates' (see PC1 loadings plot for the cluster) to see if there are specific compounds which consistently perform worse or better than the benchmark while using the model.

Colorful bubble graph training compounds!

Colorful bubble graph validation compounds!

EchteRobert · 2022-04-12T18:32:37Z

Evaluation Stain3 optimized model

After tuning a bunch of hyperparameters using Stain3 plates I trained a model on Stain2 plates using the same hyperparameters and training methods to see if this new setup is compatible across plates. I changed the data that is used to calculate the validation loss, so that selecting the best validation loss model will actually yield the best performance on the validation compounds. See #6 (comment) for the finding of this validation loss issue and #6 (comment) for the hyperparameter experiment details.

Main takeaways

The model has actually improved all scores both for training and validation data, showing that the optimized parameters work for this task in a more general sense than just for Stain3 plates.
The update on the validation loss now better represents the performance on the validation compounds. The model with the best validation loss performs equal or better than the last epoch model for 6 out of 7 plates on the validation compounds.

Results

mAP table with last epoch model here!

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00112201	0.81	0.4	0.47	0.32	100	66.7
BR00112198	0.78	0.35	0.49	0.3	100	56.7
BR00112204	0.82	0.35	0.42	0.29	100	58.9
Validation plates
BR00112202	0.52	0.34	0.35	0.3	94.4	54.4
BR00112197standard	0.54	0.4	0.44	0.28	95.6	56.7
BR00112197repeat	0.55	0.41	0.4	0.31	95.6	63.3
BR00112197binned	0.48	0.41	0.41	0.3	91.1	58.9

mAP table with best validation loss model here!

Numbers in bold are better than the last epoch model. Numbers in italic are worse.

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00112201	0.65	0.4	0.45	0.32	98.9	66.7
BR00112198	0.59	0.35	0.49	0.3	98.9	56.7
BR00112204	0.59	0.35	0.46	0.29	100	58.9
Validation plates
BR00112202	0.48	0.34	0.37	0.3	95.6	54.4
BR00112197standard	0.51	0.4	0.44	0.28	93.3	56.7
BR00112197repeat	0.49	0.41	0.47	0.31	93.3	63.3
BR00112197binned	0.46	0.41	0.41	0.3	85.6	58.9

EchteRobert self-assigned this Feb 28, 2022

EchteRobert added the Development label Feb 28, 2022

EchteRobert changed the title ~~03. Model for all Stain datasets (2, 3, 4, and 5)~~ 03. Model for Stain2 dataset Mar 8, 2022

EchteRobert mentioned this issue Mar 23, 2022

04. Model for Stain3 #6

Open

EchteRobert changed the title ~~03. Model for Stain2 dataset~~ 03. Model for Stain2 Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03. Model for Stain2 #5

03. Model for Stain2 #5

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Mar 1, 2022 •

edited

Loading

niranjchandrasekaran commented Mar 1, 2022

EchteRobert commented Mar 1, 2022

niranjchandrasekaran commented Mar 1, 2022

EchteRobert commented Mar 2, 2022 •

edited

Loading

EchteRobert commented Mar 2, 2022 •

edited

Loading

EchteRobert commented Mar 2, 2022 •

edited

Loading

shntnu commented Mar 4, 2022 •

edited

Loading

EchteRobert commented Mar 8, 2022 •

edited

Loading

EchteRobert commented Mar 11, 2022 •

edited

Loading

EchteRobert commented Mar 17, 2022

EchteRobert commented Mar 18, 2022 •

edited

Loading

shntnu commented Mar 18, 2022

EchteRobert commented Mar 21, 2022 •

edited

Loading

EchteRobert commented Mar 22, 2022 •

edited

Loading

EchteRobert commented Mar 31, 2022

EchteRobert commented Apr 12, 2022 •

edited

Loading

03. Model for Stain2 #5

03. Model for Stain2 #5

Comments

EchteRobert commented Feb 28, 2022 • edited Loading

EchteRobert commented Feb 28, 2022 • edited Loading

EchteRobert commented Feb 28, 2022 • edited Loading

Experiment 1

Main Takeaways

Conclusion

EchteRobert commented Feb 28, 2022 • edited Loading

EchteRobert commented Mar 1, 2022 • edited Loading

Experiment 1 (continued)

Main takeaways

niranjchandrasekaran commented Mar 1, 2022

EchteRobert commented Mar 1, 2022

niranjchandrasekaran commented Mar 1, 2022

EchteRobert commented Mar 2, 2022 • edited Loading

Experiment (intermediate)

Main takeaways

EchteRobert commented Mar 2, 2022 • edited Loading

Experiment 2

Main takeaways

EchteRobert commented Mar 2, 2022 • edited Loading

Experiment 3

Main takeaways

shntnu commented Mar 4, 2022 • edited Loading

EchteRobert commented Mar 8, 2022 • edited Loading

Experiment 3V2

Main takeaways

EchteRobert commented Mar 11, 2022 • edited Loading

EchteRobert commented Mar 17, 2022

Experiments

Main takeaways

Next up

EchteRobert commented Mar 18, 2022 • edited Loading

Experiment

Main takeaways

Next up

shntnu commented Mar 18, 2022

EchteRobert commented Mar 21, 2022 • edited Loading

Experiment

Main takeaways

Next up

EchteRobert commented Mar 22, 2022 • edited Loading

Experiment

Main takeaways

Next up

EchteRobert commented Mar 31, 2022

Evaluation

EchteRobert commented Apr 12, 2022 • edited Loading

Evaluation Stain3 optimized model

Main takeaways

Results

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Feb 28, 2022 •

edited

Loading

EchteRobert commented Mar 1, 2022 •

edited

Loading

EchteRobert commented Mar 2, 2022 •

edited

Loading

EchteRobert commented Mar 2, 2022 •

edited

Loading

EchteRobert commented Mar 2, 2022 •

edited

Loading

shntnu commented Mar 4, 2022 •

edited

Loading

EchteRobert commented Mar 8, 2022 •

edited

Loading

EchteRobert commented Mar 11, 2022 •

edited

Loading

EchteRobert commented Mar 18, 2022 •

edited

Loading

EchteRobert commented Mar 21, 2022 •

edited

Loading

EchteRobert commented Mar 22, 2022 •

edited

Loading

EchteRobert commented Apr 12, 2022 •

edited

Loading