Tuesday, May 2, 2017

Resolution of pairs of overlapping chromosomes by semantic segmentation: results from cooperation.

Recently, Rohit Gosh and  Lily Hu  posted reports that a neural network called U-Net can be trained to resolve pairs of overlapping chromosomes.

The story of the dataset:

A first version of the dataset was proposed on kaggle:
By that time, François Chollet, the developper of Keras, openned the ai.on project, a place where to propose different problems to be takled with artificial intelligence approaches.
The "chromosomes problem" was retained and a community of interrested peoples started to think on the problem (There's also a google group)

First results

Rohit Ghosh tried to mix both categories of pixels belonging to non overlapping chromosomes (green), this results in 97% accuracy in the predicted mask !! However, this can lead to combinatoric issues when trying to solve the overlapp.

Rohit Ghosh's results considering overlapping pixels and mixing single chromsomes in a unique class of pixels.

Rohit Ghosh succeeded also in training a U-Net to predict the three labels:

Rohit Ghosh's results on training U-Net to predict the three labels.

According to Rohit Ghosh, when the three labels are predicted, the prediction can reach an accuracy above 80%.

Lily Hu's results

Lily Hu succeeded to train a modified version of U-net (It again !), here's what she got:

Examples of Lily's Hu results (https://blog.insightdatascience.com/separating-overlapping-chromosomes-with-deep-learning-based-image-segmentation-22f97afd3283)

Here the single chromosomes are are represented in red/green and the overlapping domain in blue. All the examples represents,partial overlapps (the cytogeneticist's nightmare). The second prediction is a little bit weird, but the remaining are  really good.
As mentionned in her post, Lily Hu had to pre-process the labels to remove spurious pixels:

I am totally guilty, those pixels originate from the fact that masks were generated prior chromosomes rotation as it was explained:

Example of chromosome resolution (with removing of spurious pixels)

In the new version dataset, the labels are clean.

The traing curves are interresting:

 The IOU index for overlapping pixels is above 90%!! Wow!

I suspect that with the new generation dataset (the clean one) which contain much more images at full or low resolution, the training could be improved.

And now what?

Rohit Ghosh an Lily hu proposed an implementation of U-Net and showed that the approach makes sense.

Here let me allow to cite Lily Hu's conclusions in her post and to propose solutions:

"Future Efforts

I am excited by these results but also looking forward to what the open source community could do to further these efforts. Here are a few thoughts on this, based on my experience:
  • The data set can be supplemented with images of single chromosomes and more than two overlapping chromosomes."
15 metaphases are available in in the raw images directory, where single chromosomes can be extracted. 
Regarding overlapping with more than two chromosomes, there's a combinatoric explosion issue.
  • Data augmentation can also include transformations such as rotations, reflections, and stretching.
It may be not necessary, the new dataset contains more examples of overlapping. Additionnal metaphases could also be used, introducting thus variations due to the variability  in chromosomes morphology.


    Datasets + notebooks generator

     Semantic segmentation with UNet on github

    Rohit Ghosh 's results
    Lily Hu' results

    Wednesday, April 12, 2017

    Modelisation of overlapping pairs of chromosomes: a dataset containing more than 200 000 pairs of images in png format.

    Two datasets modelizing pair of overlapping chromosomes are available on github:
    Images are available at full resolution but also at a lower resolution to reduce the memory imprint. To get a smaller memory imprint, the two componants (Cy3, DAPI) of the original image were combined into one grayscaled image.

    The chromosomes were chosen to represent the different chromosomal morphologies. The training and the validation datasets were generated with different chromosomes belonging to the same metaphase (slide jp21,metaphase 3) from human normal lymphocytes couterstained with DAPI and hybridized with a telomeric Cy3-labelled PNA probe.

    Low resolution images dataset: 

    Training dataset

    The training dataset was generated from six chromosomes:

    The dataset consists in 100 085 pairs of greyscaled/groundtruth images of shape 80x82. 

    Simulation of overlapping pairs of chromosomes. Sample of low resolution image dataset.

    Groundtruth labels of the training dataset

    Validation dataset:

    The validation dataset was generated with other chromosomes from the same metaphase:

    The validation dataset  contains 111 123 pairs (grey + groundtruth label) of images (8bits, png)

    Full resolution dataset:

    • The training dataset contains 97 976 pairs of 201x211 images.
    • The validation dataset contains 50 543 pairs of grey / groundtruth images.

    Saturday, April 1, 2017

    100 overlapping, full resolution, chromosomes isolated from 14 metaphases of human lymphocytes

    Metaphasics chromosomes were hybrized with a PNA telomeric probe (CCCTAA-Cy3) and an additional oligo DNA probe (Cy5) for testing purposes.

    100 overlapping chromosomes can be found in that dataset (jpp21). The dataset consists of 12 bits raw greyscaled images available on the DeepFISH repository. The images of overlapping chromosomes were isolated and converted into 8bits rgb images:

    Sample of 100 overlapping chromosomes (full resolution)
    The overlapping images are also available as combination of DAPI and Cy3 components:

    Combination DAPI+Cy3 images (inversed greyscale)

    Monday, November 21, 2016

    Generation of 82146 chromosomal overlappings from pairs of chromosomes

    The following jupyter notebook was published on github . The aim is to generate a large dataset of overlapping chromosomes (grey scaled image + ground truth label image) to train a neural network to perform semantic segmentation on such images.
    To gain a large number of images the resolution was decreased by 16. A first try proposed in the ai.on project seems to do a very good job. The results was obtained from a dataset of 13434 pair of images from a python implementation of Unet.

    Monday, October 31, 2016

    DeepFISH : a github repository where to find datasets, notebooks and raw images

    For the opening of the topic about chromosomes segmentation on AI.ON, a github repository, DeepFISH (Sorry for the name) was created.

    Today, this repo contains:
    • datasets: hope to train some kind of convolution neural network to perform semantic segmentation to resolve overlapping chromosomes. The two datasets modelize the overlapping of two chromosomes.
    • notebooks: for generating the dataset and for loading and displaying the datasets. The jupyter notebooks are written in python 3. Initially the notebook used to generate the datasets was written in python2.
    • raw images: This is additional raw images of metaphasic chromosomes staines with DAPI and labelled with a telomeric probe (PNA-Cy3).

    Wednesday, October 19, 2016

    chromosome segmentation problem on AI.ON

    The problem of chromosomes segmentation is exposed on AI.ON.
    Some data for possibly training an ANN are available at kaggle and the way to produce them too.

    Wednesday, July 20, 2016

    An example of modelization of overlapping chromosomes

    example of generation of overlapping chromosomes.Left: greyscale image (DAPI+Telomeres). Right: red label maps to pixels belonging to the overlapping domain, blue and yellow maps to pixel belonging to single chromosomes.

    This image is used for illustration on kagle.