Tuesday, May 2, 2017

Resolution of pairs of overlapping chromosomes by semantic segmentation: results from cooperation.

Recently, Rohit Gosh and  Lily Hu  posted reports that a neural network called U-Net can be trained to resolve pairs of overlapping chromosomes.

The story of the dataset:

A first version of the dataset was proposed on kaggle:
By that time, François Chollet, the developper of Keras, openned the ai.on project, a place where to propose different problems to be takled with artificial intelligence approaches.
The "chromosomes problem" was retained and a community of interrested peoples started to think on the problem (There's also a google group)

First results

Rohit Ghosh tried to mix both categories of pixels belonging to non overlapping chromosomes (green), this results in 97% accuracy in the predicted mask !! However, this can lead to combinatoric issues when trying to solve the overlapp.

Rohit Ghosh's results considering overlapping pixels and mixing single chromsomes in a unique class of pixels.

Rohit Ghosh succeeded also in training a U-Net to predict the three labels:

Rohit Ghosh's results on training U-Net to predict the three labels.

According to Rohit Ghosh, when the three labels are predicted, the prediction can reach an accuracy above 80%.

Lily Hu's results

Lily Hu succeeded to train a modified version of U-net (It again !), here's what she got:

Examples of Lily's Hu results (https://blog.insightdatascience.com/separating-overlapping-chromosomes-with-deep-learning-based-image-segmentation-22f97afd3283)

Here the single chromosomes are are represented in red/green and the overlapping domain in blue. All the examples represents,partial overlapps (the cytogeneticist's nightmare). The second prediction is a little bit weird, but the remaining are  really good.
As mentionned in her post, Lily Hu had to pre-process the labels to remove spurious pixels:

I am totally guilty, those pixels originate from the fact that masks were generated prior chromosomes rotation as it was explained:

Example of chromosome resolution (with removing of spurious pixels)

In the new version dataset, the labels are clean.

The traing curves are interresting:

 The IOU index for overlapping pixels is above 90%!! Wow!

I suspect that with the new generation dataset (the clean one) which contain much more images at full or low resolution, the training could be improved.

And now what?

Rohit Ghosh an Lily Hu proposed an implementation of U-Net and showed that the approach makes sense.

Here let me allow to cite Lily Hu's conclusions in her post and to propose solutions:

"Future Efforts

I am excited by these results but also looking forward to what the open source community could do to further these efforts. Here are a few thoughts on this, based on my experience:
  • The data set can be supplemented with images of single chromosomes and more than two overlapping chromosomes."
15 metaphases are available in in the raw images directory, where single chromosomes can be extracted. 
Regarding overlapping with more than two chromosomes, there's a combinatoric explosion issue.
  • Data augmentation can also include transformations such as rotations, reflections, and stretching.
It may be not necessary, the new dataset contains more examples of overlapping. Additionnal metaphases could also be used, introducting thus variations due to the variability  in chromosomes morphology.


    Datasets + notebooks generator

     Semantic segmentation with UNet on github

    Rohit Ghosh 's results
    Lily Hu' results