The last post is from October 2020. The main line of conduct was to progress on chromosome instance segmentation, but a robust semantic, U-net based, would have satisfying too (possibly using Fastai).
Detectron2, PixelLib and many others provide instance segmentation algorithms (Mask RCNN for example). To train a model, the COCO format for the so called ground-truth labels seems to be mandatory. The issue is that the different datasets generated to simulate overlapping chromosomes, the labels are grey scaled images decomposable into binary masks for one-hot encoding:
In the last try, the idea was to start from the COCO specs and to write some code to convert the binary masks into COCO files but that was a fail as detectron2 didn't want my minimalist dataset.
Making a minimal valid coco data
From a grey-scaled image a coco file is generated using an interactive online tool as https://www.makesense.ai/:
The coco dataset corresponding to this single image is a json file :
With a xml viewer in Colab, we can see how the file is structured:
The file corresponds to only one image:"grey0000001.png"The two chromosomes annotated appeared as id:0 and id:0 in the annotations field:
The contour of one of the two chromosomes is coded a 24 values. Possibly 12 pairs of coordinates:
The chromosome bounding box seems to be defined by two diagonal points, so we have a pair of coordinates:
Finally, there are only one category of instances: "chromosome"
Back to COCO
and play with a minimalist valid dataset with pycocotools:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.