Wednesday, April 12, 2017

Modeling overlapping pairs of chromosomes: a dataset containing more than 200 000 pairs of images in png format.

Two datasets modelizing pair of overlapping chromosomes are available on github:
Images are available at full resolution but also at a lower resolution to reduce the memory imprint. To get a smaller memory imprint, the two componants (Cy3, DAPI) of the original image were combined into one grayscaled image.

The chromosomes were chosen to represent the different chromosomal morphologies. The training and the validation datasets were generated with different chromosomes belonging to the same metaphase (slide jp21,metaphase 3) from human normal lymphocytes couterstained with DAPI and hybridized with a telomeric Cy3-labelled PNA probe.

Low resolution images dataset: 

Training dataset

The training dataset was generated from six chromosomes:

The dataset consists in 100 085 pairs of greyscaled/groundtruth images of shape 80x82. 

Simulation of overlapping pairs of chromosomes. Sample of low resolution image dataset.

Groundtruth labels of the training dataset

Validation dataset:

The validation dataset was generated with other chromosomes from the same metaphase:

The validation dataset  contains 111 123 pairs (grey + groundtruth label) of images (8bits, png)

Full resolution dataset:

  • The training dataset contains 97 976 pairs of 201x211 images.
  • The validation dataset contains 50 543 pairs of grey / groundtruth images.