Thursday, October 12, 2017

Predictions from OverlapSegmentationNet after 14 epochs

OverlapSegmentationNet is a UNet model implemented with Keras. The network was trained for 14 epochs (~8 hours on a GT740M GPU), this is not enough to make a good prediction but it is enough to play with the model to understand how a prediction from a single image looks like.


The dataset used for training was the first proposed on kaggle. It consists in 13434 pairs of greyscale/ground truth images of size 88x88. Each greyscaled image has a pair of overlapping chromosomes. The overlaps were synthetically generated from images of single chromosomes:

Making a prediction from a single greyscaled image:

The aim is to predict the segmentation labels with a Keras model (OverlapSegmentationNet). The 88x88 greyscaled image must be converted into a 4D numpy array (1, 88, 88, 1).
A prediction is then 4D array of shape ( 1,88,88,4), since there's one component for each label (background, chromosome 1, chromosome2, overlap). A prediction looks like:
where each component is a floating-point image. With another chromosome, we have:
A prediction takes ~ 40 ms for one low resolution image.

Prediction made on an image which do not belong to the train dataset:

A prediction was made on an image belonging to a validation dataset:
To be used for prediction, the image was down sampled and recut to fit 88x88 (and its corresponding ground truth segmentation too):
Clearly, with so few epochs the prediction is bad:
The one-hot encoded prediction can be thresholded and back converted into a multilabel segmentation and compared to the ground truth segmentation:

Notebook is here:

More details are available in the notebook:


Friday, August 25, 2017

Training OverlapSegmentationNet with gpu acceleration on a Ubuntu powered laptop.


Harnessing the gpu 1 : nvidia driver

It is  supposed that a fresh ubuntu 16.04 is set-up.
Install Nvidia proprietary driver from the system parametres menu:
The NVidia installed files, with synaptic package manager, are:

Nvidia acceleration can be switched on graphically after having installed nvidia-settings :

 Check driver installation from a terminal by starting glxgears:

When NVIDIA mode is selected, glxgears performances should increase.

Harnessing the gpu 2: install CUDA 8, cudnn 5.1

CUDA8 was installed from a .deb file available from nvidia's site:

Check cuda installation using the "./deviceQuerry" method (see the end of the post).

The Cudnn 5.1 library was installed by copying the files at the right places (Installing cudnn 6 from .deb archive wasn't a good idea. )


Install Tensorflow and Keras 

There's different ways to install tensorflow. Here, both tensorflow with gpu support and keras were installed  in a virtual environnement with pip.

Tensorflow 1.2.1 and Keras 2.0.6 were installed .

h5py was installed from pip too.
Don't forget at the end from a terminal to :

$source .bashrc

Installation of OverlapSegmentationNet:

OverlapSegmentationNet is an implementation of UNet by Dr Hu .

The image-segmentation-chromosomes directory contains:
$ ls
code  images


The first version of the low resolution dataset was used. Download it and move it into the code directory:
$ ls
Explore-data.ipynb                    __pycache__
OverlapSegmentationNet.pyc            utilities.pyc
preprocessing-jp.ipynb                xdata_88x88.npy                 ydata_88x88_0123_onehot.npy

In a terminal,  run as follow:
The code will generate the two files highlighted in green above.


Switch into gpu mode with (then logout / login):

Activate the proper virtual environnement. Mine was called tfgpu, so for my computer, it is:
$source VirtualEnv/tfgpu/bin/activate
Edit the code to set-up the desired number of epoch
From a terminal, run For example in my computer, it is:

(tfgpu) jeanpat@WA50SHQ:~/image_segmentation_chromosomes/code$ python
If the gpu is properly used by tensorflow the terminal should yield something like:

The usage of the gpu is mentionned, furthermore the cpu usage is not stucked on 100%.

On a modest nvidia GT740M gpu, 14 epochs were run for 8~9 hours.

Saturday, July 1, 2017

Counting impacts on a target (Traces of heavy ions beam) with the help of kmeans clustering.

When studying effect of ionizing radiation (heavy ions with high LET) on cells, the fluence (particules/cm²) of the particles beam must be known. Detectors in the GANIL (Caen, France) particles accelerator can estimate the fluence. However, when irradiating cells, radiobiologists (CEA CIRIL) need to localize the impacts of the ions and to count the number of impact per cell.

The application TRACES (scripts for the aphelion application) was developped: It was capable of recording and saving images from a camera, segmenting and counting traces: 
The application was designed to analyse images as the following:

raw image of traces

Counting impacts  in a python notebook:

Contrary to the TRACES application, the following jupyter notebook can't capture images (It should be possible to do it in a python script with the cv2 module) but it can perform some automatic classification.
Distinguishing traces resulting from one or several impacts is a classification issue which can be explored with unsupervised classifier such k-means clustering:

Tuesday, May 2, 2017

Resolution of pairs of overlapping chromosomes by semantic segmentation: results from cooperation.

Recently, Rohit Gosh and  Lily Hu  posted reports that a neural network called U-Net can be trained to resolve pairs of overlapping chromosomes.

The story of the dataset:

A first version of the dataset was proposed on kaggle:
By that time, François Chollet, the developper of Keras, openned the ai.on project, a place where to propose different problems to be takled with artificial intelligence approaches.
The "chromosomes problem" was retained and a community of interrested peoples started to think on the problem (There's also a google group)

First results

Rohit Ghosh tried to mix both categories of pixels belonging to non overlapping chromosomes (green), this results in 97% accuracy in the predicted mask !! However, this can lead to combinatoric issues when trying to solve the overlapp.

Rohit Ghosh's results considering overlapping pixels and mixing single chromsomes in a unique class of pixels.

Rohit Ghosh succeeded also in training a U-Net to predict the three labels:

Rohit Ghosh's results on training U-Net to predict the three labels.

According to Rohit Ghosh, when the three labels are predicted, the prediction can reach an accuracy above 80%.

Lily Hu's results

Lily Hu succeeded to train a modified version of U-net (It again !), here's what she got:

Examples of Lily's Hu results (

Here the single chromosomes are are represented in red/green and the overlapping domain in blue. All the examples represents,partial overlapps (the cytogeneticist's nightmare). The second prediction is a little bit weird, but the remaining are  really good.
As mentionned in her post, Lily Hu had to pre-process the labels to remove spurious pixels:

I am totally guilty, those pixels originate from the fact that masks were generated prior chromosomes rotation as it was explained:

Example of chromosome resolution (with removing of spurious pixels)

In the new version dataset, the labels are clean.

The traing curves are interresting:

 The IOU index for overlapping pixels is above 90%!! Wow!

I suspect that with the new generation dataset (the clean one) which contain much more images at full or low resolution, the training could be improved.

And now what?

Rohit Ghosh an Lily Hu proposed an implementation of U-Net and showed that the approach makes sense.

Here let me allow to cite Lily Hu's conclusions in her post and to propose solutions:

"Future Efforts

I am excited by these results but also looking forward to what the open source community could do to further these efforts. Here are a few thoughts on this, based on my experience:
  • The data set can be supplemented with images of single chromosomes and more than two overlapping chromosomes."
15 metaphases are available in in the raw images directory, where single chromosomes can be extracted. 
Regarding overlapping with more than two chromosomes, there's a combinatoric explosion issue.
  • Data augmentation can also include transformations such as rotations, reflections, and stretching.
It may be not necessary, the new dataset contains more examples of overlapping. Additionnal metaphases could also be used, introducting thus variations due to the variability  in chromosomes morphology.


    Datasets + notebooks generator

     Semantic segmentation with UNet on github

    Rohit Ghosh 's results
    Lily Hu' results

    Wednesday, April 12, 2017

    Modeling overlapping pairs of chromosomes: a dataset containing more than 200 000 pairs of images in png format.

    Two datasets modelizing pair of overlapping chromosomes are available on github:
    Images are available at full resolution but also at a lower resolution to reduce the memory imprint. To get a smaller memory imprint, the two componants (Cy3, DAPI) of the original image were combined into one grayscaled image.

    The chromosomes were chosen to represent the different chromosomal morphologies. The training and the validation datasets were generated with different chromosomes belonging to the same metaphase (slide jp21,metaphase 3) from human normal lymphocytes couterstained with DAPI and hybridized with a telomeric Cy3-labelled PNA probe.

    Low resolution images dataset: 

    Training dataset

    The training dataset was generated from six chromosomes:

    The dataset consists in 100 085 pairs of greyscaled/groundtruth images of shape 80x82. 

    Simulation of overlapping pairs of chromosomes. Sample of low resolution image dataset.

    Groundtruth labels of the training dataset

    Validation dataset:

    The validation dataset was generated with other chromosomes from the same metaphase:

    The validation dataset  contains 111 123 pairs (grey + groundtruth label) of images (8bits, png)

    Full resolution dataset:

    • The training dataset contains 97 976 pairs of 201x211 images.
    • The validation dataset contains 50 543 pairs of grey / groundtruth images.

    Saturday, April 1, 2017

    100 overlapping, full resolution, chromosomes isolated from 14 metaphases of human lymphocytes

    Metaphasics chromosomes were hybrized with a PNA telomeric probe (CCCTAA-Cy3) and an additional oligo DNA probe (Cy5) for testing purposes.

    100 overlapping chromosomes can be found in that dataset (jpp21). The dataset consists of 12 bits raw greyscaled images available on the DeepFISH repository. The images of overlapping chromosomes were isolated and converted into 8bits rgb images:

    Sample of 100 overlapping chromosomes (full resolution)
    The overlapping images are also available as combination of DAPI and Cy3 components:

    Combination DAPI+Cy3 images (inversed greyscale)