Notebook on kaggle (choose version 2)
This blog is dedicated to Digital Image Processing for fluorescence in-situ hybridization, QFISH and other things about the telomeres.
Having anaconda installed on a ubuntu 20.04 box:
conda create --prefix /mnt/stockage/Developp/EnvPLFlash
and activate the env with:
conda activate /mnt/stockage/Developp/EnvPLFlash
To have pytorch 1.8 with cuda support:
then
pip install icedata
pip install lightning-flash
pip install notebook
pip install voila
Without forgetting to install lightning-flash[image] to get the instance segmentation algorithms
pip install 'icevision' 'lightning-flash[image]'
The installation can be checked running the following notebook:
125 grey scaled images were chosen from a previous dataset available on github.
An annotation file was generated by hand online with makesense.ai and saved as a unique json file in COCO format. This small dataset is freely available as an archive.
Pycococreator by Waspinator was used to display the annotations (aka the segmentation) over a grey scaled image in the following jupyter notebook
validating de facto the annotation file produced with makesense.ai:
This is the next step.
The idea is to follow the tutorial on custom dataset registration, possibly using the balloons example by davamix.
The last post is from October 2020. The main line of conduct was to progress on chromosome instance segmentation, but a robust semantic, U-net based, would have satisfying too (possibly using Fastai).
Detectron2, PixelLib and many others provide instance segmentation algorithms (Mask RCNN for example). To train a model, the COCO format for the so called ground-truth labels seems to be mandatory. The issue is that the different datasets generated to simulate overlapping chromosomes, the labels are grey scaled images decomposable into binary masks for one-hot encoding:
Detectron2 provides several algorithms for instance segmentation, so it was tempting to submit the overlapping datasets to one of those. However, to use one of these algorithms, the dataset format seem to follow the MS-COCO format.
One available dataset consists in 2164 pairs of grayscaled+groundtruth images.To give a try, a minimalist dataset with one image and two instances could be converted to COCO format:
The two instances (right) are obtained from the groundtruth image showing the overlapping chromosomes. The instances are numpy arrays which can be saved as png images. To generate a COCO dataset associated to the gray scaled image (left), the following steps were followed:
The whole process is available in a jupyter notebook on Kaggle.
Unfortunately, the dataset is not a legit COCO dataset as the dataset registration fails. Hope to get some help on Pytorch forum or from stackoverflow.
![]() |
| Pairs of chromosomes are ordered by columns from HSA 1 (left) to XY (right) . Metaphases are ordered by row. |