DIP4FISH

Wednesday, January 26, 2022

Loading the 125-COCO chromosomes dataset with ligthning flash: Notebook available on kaggle

Notebook on kaggle (choose version 2)

Thursday, October 14, 2021

Installation of lightning-flash

Having anaconda installed on a ubuntu 20.04 box:

Create a virtual environment, specifying the disk:

conda create --prefix /mnt/stockage/Developp/EnvPLFlash

and activate the env with:

conda activate /mnt/stockage/Developp/EnvPLFlash

Then install the libs starting with pytorch with cuda support:

To have pytorch 1.8 with cuda support:

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch-lts

then

pip install icedata

pip install lightning-flash

pip install notebook

pip install voila

Without forgetting to install lightning-flash[image] to get the instance segmentation algorithms

pip install 'icevision' 'lightning-flash[image]'

The installation can be checked running the following notebook:

Wednesday, July 7, 2021

Back to COCO 2: A less minimalistic 125 images +json dataset

125 grey scaled images were chosen from a previous dataset available on github.

An annotation file was generated by hand online with makesense.ai and saved as a unique json file in COCO format. This small dataset is freely available as an archive.

Check annotation file validity with pycococreator.

Pycococreator by Waspinator was used to display the annotations (aka the segmentation) over a grey scaled image in the following jupyter notebook

validating de facto the annotation file produced with makesense.ai:

Data registration in detectron2:

This is the next step.

The idea is to follow the tutorial on custom dataset registration, possibly using the balloons example by davamix.

Tuesday, June 15, 2021

Try to unstuck: back to COCO

The last post is from October 2020. The main line of conduct was to progress on chromosome instance segmentation, but a robust semantic, U-net based, would have satisfying too (possibly using Fastai).

Detectron2, PixelLib and many others provide instance segmentation algorithms (Mask RCNN for example). To train a model, the COCO format for the so called ground-truth labels seems to be mandatory. The issue is that the different datasets generated to simulate overlapping chromosomes, the labels are grey scaled images decomposable into binary masks for one-hot encoding:

In the last try, the idea was to start from the COCO specs and to write some code to convert the binary masks into COCO files but that was a fail as detectron2 didn't want my minimalist dataset.

Making a minimal valid coco data

From a grey-scaled image a coco file is generated using an interactive online tool as https://www.makesense.ai/:

The coco dataset corresponding to this single image is a json file :

With a xml viewer in Colab, we can see how the file is structured:

The file corresponds to only one image:"grey0000001.png"

The two chromosomes annotated appeared as id:0 and id:0 in the annotations field:

The contour of one of the two chromosomes is coded a 24 values. Possibly 12 pairs of coordinates:

The chromosome bounding box seems to be defined by two diagonal points, so we have a pair of coordinates:

Finally, there are only one category of instances: "chromosome"

Back to COCO and play with a minimalist valid dataset with pycocotools:

Monday, October 26, 2020

Looking inside the "2164 dataset": How balanced is it?

This dataset consists in synthetic images of overlapping pairs of chromosomes+ground-truth labels (chromosome1, chromosome2, overlap). A simple morphological sieve was applied on the dataset to sort the different kind of chromosomal overlaps.

Friday, August 7, 2020

COCO dataset from scratch : try and fail ...

Detectron2 provides several algorithms for instance segmentation, so it was tempting to submit the overlapping datasets to one of those. However, to use one of these algorithms, the dataset format seem to follow the MS-COCO format.

One available dataset consists in 2164 pairs of grayscaled+groundtruth images.To give a try, a minimalist dataset with one image and two instances could be converted to COCO format:

The two instances (right) are obtained from the groundtruth image showing the overlapping chromosomes. The instances are numpy arrays which can be saved as png images. To generate a COCO dataset associated to the gray scaled image (left), the following steps were followed:

generate a python dictionary according to the COCO format specification found in the detectron2 documentation and convert the binary masks to their bounding boxes and compressed rle using pycocotools.
Save the dictionary as a json file
Load the json file with pycocotools (or detectron2) in order to visualize if possible the instances overlaying the gray scaled image.

The whole process is available in a jupyter notebook on Kaggle.

Unfortunately, the dataset is not a legit COCO dataset as the dataset registration fails. Hope to get some help on Pytorch forum or from stackoverflow.

Tuesday, April 28, 2020

Karyotype of ten fibroblasts after Alu sequences and telomeres hybridization

The image bellow was obtain after sequential hybridization of telomeric PNA probe (FITC:green) followed by hybridization of Alu-PCR product (Rhodamin: Red):

Pairs of chromosomes are ordered by columns from HSA 1 (left) to XY (right) . Metaphases are ordered by row.

Image acquisition was performed with a low resolution 8bits camera mounted on a Leica DMRB fluorescence microscope (100x). Raw images were transferred from a Unix Cytovision station with 3 inch 1/2 floppy disk to a power Macintosh 9500.

Example of a metaphase after alignment of Alu image on DAPI / telomeres images:


Alu images were aligned on DAPI by hand using Photoshop 3 (Here no background correction, nor contrast enhancement) Whole chromosome painting was also performed sequentially, simultaneously on HSA-13 and HSA-6 and HSA-X:

Karyotyping:

Combining R bands (Alu), G bands (DAPI) and chromosomal painting allows to classify chromosomes in a karyogram:

Left:R Bands (Alu sequences). Right: G bands (Inverse DAPI+DoG filter).