Tuesday, June 15, 2021

Try to unstuck: back to COCO

The last post is from October 2020. The main line of conduct was to progress on chromosome instance segmentation, but a robust semantic, U-net based, would have satisfying too (possibly using Fastai).

Detectron2, PixelLib and many others provide instance segmentation algorithms (Mask RCNN for example). To train a model, the COCO format for the so called ground-truth labels seems to be mandatory. The issue is that the different datasets generated to simulate overlapping chromosomes, the labels are grey scaled images decomposable into binary masks for one-hot encoding:

 
In the last try, the idea was to start from the COCO specs and to write some code to convert the binary masks into COCO files but that was a fail as detectron2 didn't want my minimalist dataset.

Making a minimal valid coco data

From a grey-scaled image a coco file is generated using an interactive online tool as https://www.makesense.ai/:
 

The coco dataset corresponding to this single image is a json file :
 

With a xml viewer in Colab, we can see how the file is structured:
The file corresponds to only one image:"grey0000001.png"
The two chromosomes annotated appeared as id:0 and id:0 in the annotations field:
The contour of one of the two chromosomes is coded a 24 values. Possibly 12 pairs of coordinates:
The chromosome bounding box seems to be defined by two diagonal points, so we have a pair of coordinates:
Finally, there are only one category of instances: "chromosome"

Back to COCO and play with a minimalist valid dataset with pycocotools: