DIP4FISH: 2020

Monday, October 26, 2020

Looking inside the "2164 dataset": How balanced is it?

This dataset consists in synthetic images of overlapping pairs of chromosomes+ground-truth labels (chromosome1, chromosome2, overlap). A simple morphological sieve was applied on the dataset to sort the different kind of chromosomal overlaps.

Friday, August 7, 2020

COCO dataset from scratch : try and fail ...

Detectron2 provides several algorithms for instance segmentation, so it was tempting to submit the overlapping datasets to one of those. However, to use one of these algorithms, the dataset format seem to follow the MS-COCO format.

One available dataset consists in 2164 pairs of grayscaled+groundtruth images.To give a try, a minimalist dataset with one image and two instances could be converted to COCO format:

The two instances (right) are obtained from the groundtruth image showing the overlapping chromosomes. The instances are numpy arrays which can be saved as png images. To generate a COCO dataset associated to the gray scaled image (left), the following steps were followed:

generate a python dictionary according to the COCO format specification found in the detectron2 documentation and convert the binary masks to their bounding boxes and compressed rle using pycocotools.
Save the dictionary as a json file
Load the json file with pycocotools (or detectron2) in order to visualize if possible the instances overlaying the gray scaled image.

The whole process is available in a jupyter notebook on Kaggle.

Unfortunately, the dataset is not a legit COCO dataset as the dataset registration fails. Hope to get some help on Pytorch forum or from stackoverflow.

Tuesday, April 28, 2020

Karyotype of ten fibroblasts after Alu sequences and telomeres hybridization

The image bellow was obtain after sequential hybridization of telomeric PNA probe (FITC:green) followed by hybridization of Alu-PCR product (Rhodamin: Red):

Pairs of chromosomes are ordered by columns from HSA 1 (left) to XY (right) . Metaphases are ordered by row.

Image acquisition was performed with a low resolution 8bits camera mounted on a Leica DMRB fluorescence microscope (100x). Raw images were transferred from a Unix Cytovision station with 3 inch 1/2 floppy disk to a power Macintosh 9500.

Example of a metaphase after alignment of Alu image on DAPI / telomeres images:


Alu images were aligned on DAPI by hand using Photoshop 3 (Here no background correction, nor contrast enhancement) Whole chromosome painting was also performed sequentially, simultaneously on HSA-13 and HSA-6 and HSA-X:

Karyotyping:

Combining R bands (Alu), G bands (DAPI) and chromosomal painting allows to classify chromosomes in a karyogram:

Left:R Bands (Alu sequences). Right: G bands (Inverse DAPI+DoG filter).

Friday, January 31, 2020

2164 full resolution pairs of synthetic images of two overlapping chromosomes

After having fixed the groundtruth images of the "13434" dataset, an older but full resolution dataset, has to be repaired too.

Repair of the "overlapping_chromosomes_examples.h5" dataset:

This dataset contained originally 2854 (grayscaled+groundtruth) pairs of 190x189 images, stored in a unique numpy array. Its shape was 2854x190x189x2.

The grayscaled images could suffer from two problems:

Some grayscaled images components had black dots: those images were removed (with their corresponding groundtruth labels).

The images dtype was int64, it is now np.uint8

The overlapping domain is also now more realistic compared with real overlapping chromosomes:

The labels of the groundtruth don't have no more spurious pixels:

Dataset format:

Once downloaded, the dataset shape is (2164, 190, 189, 2) available as:

an hdf5 file
a compressed (.npz) numpy array

Download the dataset with a jupyter notebook:

Thursday, January 16, 2020

A conversion to COCO dataset format task in sight

Detectron 2 provides numerous models for semantic/instance segmentation. Contrary to tensorflow 2, the couple pytorch1.3/detectron doesn't seem to require an avx capable CPU. However, detectron 2 works on datasets build according to the COCO format.

Start to think about how to convert the different overlapping chromosomes datasets into a coco dataset: