Pages

Wednesday, June 5, 2013

From binary image skeleton to networkx graph (work in progress ...)

To recognize objects in an image methods involving graphs have been developped as the shock graph method. These graphs are derived from image skeleton and then handled, compared. However, it seems taken for granted that a graph is available for work.
Here, a python code generating a networkx graph object from the skeleton of a binary image is proposed.

The image of the B letter was chosen as model. The image is generated with PIL:

Skeleton decomposition:

The skeleton was then computed by morphological thining, branched points were extracted with a hit-or-miss operator and labelled:

A bug somewhere (update fri 06, 2013):

For some shapes, as 'Y':

the algorithm fails at the end points linking step (see bellow). A way to cure the situation is to prune the initial skeleton of 'Y' once:
In the case of the letter 'A', the skeleton must be pruned three times for the algorithm to succeed:

Branched points neighborhood:

The end points can also be extracted and labelled from the initial skeleton, but end points from the edges will be needed.
The edges were obtained by difference between skeleton and branched points, there seven edges labelled from 1 to 7. The edges are pruned by one pixel (labels are 1,2,3,4,5,6,7).The edges end points were then extracted and labelled individually (1,2,3,...,14). The end points were also labelled according to their edge of origin ( 1,2,3,4,5,6,7):
For each branched point, its neighborhood was isolated to get the labels of the adjacent end points (right images):

Building the graph:

The graph was build by adding branched points first (vertex 1, 2, 3, 4), it assumes that at least one branched point exists, then  end points adjacent to the branched points are added (vertex 8,9,10 and 11,12,13 and 14,15,16): 
The remaining lonely end points are added to the graph (vertex 17, 18):
The end points are linked using the pruned edges, the edges are weighted according to their pixel size. The edges length here do not match to their weight:
Graph of the letter B, without initial skeleton pruning

Bug origin:

With the letter 'Y', with initial skeleton pruning, the algorithm yields the following graph:
Branched points are figured in red circles on the image of labelled edges and end points are figured by a white star.
In the case of the letter 'A', the algorithm yields :
The bug seems to come from too short edges as after three pruning steps removing the small barbs on the top of the letter, the algorithm succeeds in producing a graph

Download ipython notebook


Thursday, May 16, 2013

An endpoints spectrum:possible features from the skeleton?

Looking for topological features:

The skeleton of a binary shape can be used to find features to classify it. The number of endpoints or the number of branched points should be good candidates. A shape which skeleton has no endpoint nor no branched point could be a circle:
 O
A shape with one branched point and no endpoint, may looks like this :
8

Binary model shapes were loaded in an ipython notebook with skimage which was capable of handling .pgm format :

Skeletons were generated by morphological thinning implemented in  mahotas:
endpoints (blue), branched points (green) champs

Endpoints Spectrum :

The skeleton of the hand has two endpoints close to top of each fingers, as it would be fine to have one endpoint per finger, let's prune the skeleton to meet this condition. This can be achieved after some trial, however if robust features could be found from the skeleton a systematic method is needed. So, let's prune the skeleton until it has no endpoint and count the number of pruning step necessary:
From the curve, the hand skeleton has 13 endpoints before any pruning, after 40 pruning steps, it has 6 endpoints. Around 90 pruning steps are necessary to remove all endpoints. By only looking at their curve, the hand shaped sihouette and the fish shaped sihouette are different.

Endpoints spectrum resists to scale variation after normalization:

As scale invariance is a nice feature, let's see what is happening when the scale of the shape is successively reduced by a factor of 0.9², 0.7², 0.5², 0.3²:
From the images above, it is clear that  the skeleton resists to scale variations but as the skeleton becomes smaller  less pruning steps are necessary to "burn" it to the last pruned pixels set. So let's norme the pruning steps by the initial skeleton size such for 0 pruning step, the normed pruning step is 100. As the size of the hand shaped silhouette decreases (from 81%, 49%, 25%, 9% of the original size), the endpoints spectrums  remain comparable:

 Endpoints spectrum of similar shapes are similar:

How does the curve behave with similar shapes? Let's compare the endpoints spectrum of the serie of six hands:

The spectrums seem to resist to shape rotation (hand, hand-10°, hand90°) and to moderate shape deformation (hand-bent1,2). The sketch-hand silhouette is a right hand and its spectrum is comparable to the other spectrums which corresponds to left hands, so the endpoints spectrum is symetry invariant too.

Deriving a features vector from endpoints spectrum:

If shapes should be classified according to their topology, the whole endpoints spectrum is not necessary and only the transitions may be retained. Taking a  hypothetical skeleton whose endpoints spectrum before normalization is:
  • For 7 prunings the skeleton has 0 endpoints
  •       6                                        1
  •       2                                        3
  •       0                                        4
So a possible features vector would be:
(7,6,-1,2,0)
The value -1 indicates that the value 2 endpoints is missing. In the python script, the endpoints spectrum is coded as two lists:
             end_points_test   = [4,3,3,1,1,1,1,0]
             pruning_steps_test = [0,1,2,3,4,5,6,7]

A python function taking the steps list and the endpoints (spectrum) list yields a features vector :
def EPSpectrumToFeatures(steps, spectrum):
    '''
          eptest   = [4,3,3,1,1,1,1,0]
          pruntest = [0,1,2,3,4,5,6,7]
the returned vec:[7,6,-1,2,0] which means:
    0 enpoints -> 7 prunings necessary
    1 endpoint -> 6 prunings
    4 endpoints-> 0 pruning (initial skeleton size)
'''
     values = set(spectrum)
    feat_vect = [-1]*(len(values)+1)
    spec = np.asarray(spectrum)
    for epn in values:
        indexmin = np.where(spec == epn)[0][-1]
        feat_vect[epn] = steps[indexmin]
    return feat_vect
 Supposing now the the skeleton size is 25, let's normalise the pruning steps as follow:
normed_pruning_steps_test = [100*(1-i/25.0) for i in pruntest]  The endpoints spectrum is:
Let's take a normalised endpoints spectrum:
  • normedsteps = [100.0, 96.0, 92.0, 88.0, 84.0, 80.0, 76.0, 72.0]
  • endpoints = [4, 3, 3, 1, 1, 1, 1, 0] 
 
And let's submit it the previous function :

EPSpectrumToFeatures(normedsteps, endpoints)

 The function yields the vector:
[72.0, 76.0, -1, 92.0, 100.0]
which means that:
  • 0 endpoint for a normed number of  prunings of 72%
  • 1                                                                      76
  • no 2 endpoints, set normed number of  prunings to -1
  • 3 -------------------------------------------------------------------- 92%
  • 4 ------------------------------------------------------------------ 100%
The different endpoints spectrum curves suggest that the vector could used for classifying 2D binary shapes.

A feature vector from a real shape:

Let's consider the hand shape:
It's endpoints spectrum (normalised by skeleton size or not) was calculated and a feature vector was derived and displayed (red) with the spectrum:
Left: raw endpoints spectrum (blue) and its feature vector (red). Right: Normed endpoints spectrum and its feature vector, for 100% of the skeleton size, the skeleton has 14 endpoints, for ~94% the skeleton has 7 endpoints and for 83% of the skeleton size the skeleton has no endpoint. 
For a larger number of image a python class, called Shape was written. It is instantiated as follow: 
shape= Shape(image)

The feature vector can be computed by:

vector = shape.End_Points_Features(normed = True)
or by:
vector = shape.End_Points_Features(normed = False)

Feature vectors from 216 images :

216 images of small 2D binary images (silhouettes) [Sharvit et al.] were used to derive feature vectors. This set of images consists in 18 type of silhouettes with 12 instance of each. For example, there's 12 simhouettes of birds:
Prior any kind of classification the 216 normed feature vectors wer aligned for visual inspection:
Some silhouettes, as the Bone silhouettes seems to show some intra-class structure, while brick and camel silhouettes seems different.

ipython notebook (updated 24/05/2013):

All the  tasks were performed in a ipython notebook available here.

 


Tuesday, April 9, 2013

Detecting end points or branched points from the skeleton of a single chromosome

The end-points or the branched-points from the skeleton of a single chromosome can be extracted using mathematical morphology operators implemented in mahotas.
Two images of metaphasic human chromosomes hybridized with a Cy3-telomeric PNA probe and  counterstained with DAPI, were used.
 
False color DAPI image
False color telomeres image
The location of a chromosome was identified, some preprocessing step were performed to segment the chromosome (hi-pass filtering followed by a simple thresholding).
The end-points, the branching points are extracted from the skeleton with a hit-or-miss operator implemented in mahotas, then overlayed on the original chromosome image.
The contour of the segmented telomeres are also overlayed:
Depending on the amount of high-pass filtering the chromosome skeleton fit the chromosome axis or the chromatids:
Skeletonization after "low" high-pass filtering
Skeletonization after a stronger high-pass filtering
In both cases, the skeleton is displayed in red, the branched-points in green, the end-points in blue, the chromosome contour in magenta, and finally the telomeres in yellow.
To run the code (python 2.7), the path to the images must be modified and mahotas and pymorph must be installed.
Download code

Friday, February 15, 2013

Chromosomal shapes classification by Linear Discriminant Analysis with R

Features were tested to distinguish overlapping chromosomes from single chromosomes and from nuclei or from image segmentation artefacts:
Two overlapping chromosomes
Those features are:
  • The chromosomes size (normalized by the image size).
  • The ratio of chromosome area by the convexhull area

convex hull of overlapping chromosomes


  • The non chromosomal domains area in a convex hull. The four largest (normalized by the convex hull area) non chromosomal domains  were kept:
Areas of size h1, h2, h3, h4
Chromosomes from twelve metaphases were classified with minikar. In a previous post, an isolated result indicated that these six features doesn't seem to distinguish the different particles.
Each observation falls in one of the four categories (single, cluster, nuclei, dust). 451 observations were used  to build a trained set by linear discriminant analysis (lda) according to this example.

Prior  lda, a pairs plot can be done where each color corresponds to one category (single chromosome, touching chromosomes, nuclei, dusts):
Pairs plot between the six variables: var1 (area), var2 (area/convexhull), var3,4,5,6 (area of h1,h2,h3, 4)
The observations seem more or less clustered according to the pair of variables considered. The first plot (ratio ~ area) was previously done showing an overlay between the single and cluster categories. LDA was used here as a black box, with the aim to know if a good classification can be achieved and how good are the features.

Pratically the work was done with R within rstudio. Data were downloaded from github, assembled into a data frame and analysed. LDA performed here data reduction from six variables to three (LD1, LD2, LD3) and when the variable LD1 is plot against LD2, it yields:
Clearly showing separated clusters. The most populated (blue), corresponds to the single chromosomes, pink dots are the touching chromosomes, the green dots and the red dots are probably the nuclei and the dusts repectively (I have to check how to put a legend in that plot).
However, the classification is still imperfect:

 28/80 touching chromosomes and 6/30 nuclei  were classified as single chromosomes. No single chromosomes were misclassified.

# Download features and label file from github
require(RCurl)
library(RCurl)
feat_prefix <-'jp-Jpp48-'
sufix <- '-DAPI.csv'
lab_prefix <- 'shapeCateg-jp-Jpp48-'
webpath <- 'https://raw.github.com/jeanpat/pyFISH/master/GUI/karyotyper/example%20classification/source%20data/'
files <- list(1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 14, 15)
data <- data.frame()
for (file in  files){
  featname <- paste(webpath,'features/',sep='')
  featname <- paste(featname,feat_prefix,as.character(file),sufix,sep='')
  labname <- paste(webpath,'labels/',sep='')
  labname <- paste(labname, lab_prefix, as.character(file),sufix,sep='') 
  print (featname)
  print (labname)
  features <- getURL(featname)
  labels <- getURL(labname)
  features <- read.table(text = features, sep = ';')
  labels <- read.table(text = labels, sep = ';')
  #print (str(features))
  # add metaphase number ()
  features$metaphase <- file
  #merge features and labels
  features <- cbind(features,labels)
  data <- rbind(data, features)
  }
#print (head(data))
#remove duplicate column
data[,9] <- NULL
#rename columns
colnames(data)[1]<-'particle'
colnames(data)[2]<-'area'
colnames(data)[3]<-'ratio'
colnames(data)[4]<-'h1'
colnames(data)[5]<-'h2'
colnames(data)[6]<-'h3'
colnames(data)[7]<-'h4'
colnames(data)[8]<-'meta'
colnames(data)[9]<-'type'
print (str(data))
pairs(cbind(data$area,data$ratio,data$h1,data$h2,data$h3,data$h4),col=c("red","blue","green","orange")[data$type],upper.panel=NULL,)
#
# Linear discriminant analysis was performed as explained:
#
# http://www.youtube.com/watch?v=bOSVge_PQF4
#
library(lattice)
library(MASS)
chr.lda <- lda(type~area+ratio+h1+h2+h3+h4,data=data)
summary(chr.lda)
chr.pred <- predict(chr.lda)
ErrorTable<-table(data$type,chr.pred$class)
print (ErrorTable)
lda.tmp <- data.frame(chr.pred$x,class=chr.pred$class)
xyplot(LD1~LD2,data=lda.tmp, group=class)
Created by Pretty R at inside-R.org

Thursday, January 31, 2013

Selecting a maximal area quadrilateral from four random points

To calculate the area of a quadrilateral with the points coordinates, the points are supposed to be taken clockwise or counter clock wise.When the shoelace formula is applied on each of  4!=24 tuple of four points, different values can be obtained:

For example, the area of the square (on top left) with the points taken clock wise, is equal to 4. The 24 ways to calculate the area yield only two values 0 or 4. The quadrilateral of area equal to 0 is plotted on top right (red), this quadrilateral is self intersecting. The trapeze area (down left) is equal to 3, the 24 ways to calculate the area yied three values :0, 1, 3 and the maximal value of the area corresponds to a non intersecting quadrilateral (blue, down right).
Quadrilaterals were build from randomly chosen points, their area was calculated by taking the original points order:
Some of them are self intersecting (first, top left) or degenerated (the 10th). By searching the points order yielding the maximal area, non self intersecting quadrilaterals can be found:
If both initial and final quadrilaterals are overlayed:
Compare from the previous post, the function to calculate the area was modified http://people.virginia.edu/~ll2bf/docs/various/polyarea.html).

Wednesday, December 19, 2012

Simple direct quadrilaterals found with the shoelace formula

The shoelace formula can computes the area of a quadrilateral  from the points coordinates . The formula requires that the points are taken clockwise or counter clockwise,  that is a quadrilateral must not be self crossing.
A criterion implemented in python was proposed to check if two segments are intersecting, which should be the case for a self-crossing quadrilateral.

With four points, there is  4!=24 ways to calculate the area. With points defining a square, according to the points order, the area takes three values: -2, 0 and 2 for indirect, self crossing and direct square:
The first point is the purple one.
The first square (top left in red) defined by the points (1, 0), (2, 1), (1, 2), (0, 1):

((1, 0), (2, 1), (1, 2), (0, 1)) False 2.0

is detected as non self crossing (False), and the shoelace formula gives an area equal to 2 as waited  for this direct square. There is a problem for some self crossing quadrilaterals:
((1, 0), (2, 1), (0, 1), (1, 2)) False 0.0
((1, 0), (1, 2), (2, 1), (0, 1)) True 0.0

The second quadrilateral is not detected as self crossing, however the shoelace formula yield an area equal to zero, as the third quadrilateral which is correctly dtected as self crossing (True). The last quadrilateral (blue one, down right):

((0, 1), (1, 2), (2, 1), (1, 0)) False -2.0

is correctly deteted as non self crossing (False) and indirect with a negative "area" equal to -2

Trying with points yielding non convex quadrilaterals yields six values -0.95,-0.65,  -0.4, 0.4, 0.65,  0.95. None is detected as self crossing:
With other points defining a possible convex quadrilateral:
Several values are found. The maximal area value : 1.465, corresponds to the four simple quadrilaterals (non self crossing):

((1.1, 0.3), (1.7, 1.4), (1, 2), (0, 1)) False 1.465
((1.7, 1.4), (1, 2), (0, 1), (1.1, 0.3)) False 1.465
((1, 2), (0, 1), (1.1, 0.3), (1.7, 1.4)) False 1.465
((0, 1), (1.1, 0.3), (1.7, 1.4), (1, 2)) False 1.465

It seems that the maximal area value found from the set of 24 possible quadrilaterals correspond to simple direct quadrilaterals, thus testing if the four points define a non self crossing quadrilateral may not be usefull.

January 3, 2013:
The script is modified so that the quadrilaterals are plotted with matplotlib.patches.Polygon:


Download code

Wednesday, November 7, 2012

Looking for a geodesic distance to resolve touching chromosomes

In the previous post, from a cluster of touching mouse chromosomes:
DAPI stained mouse chromosomes
points belonging to negatively curved domains of a contour can be isolated :
To separate the chromosomes, it is tempting to cut between the closest pairs of points.
Python provides itertools.combinations(); the closest pairs of points are in the set of the combinations of every pairs of points:


Each line shows the euclidian distance between two points, the four shortest lines correspond more or less to the contact domains between the chromosomes, cuting along them provides a way to separate the chromosomes.

It should be possible to do better by using a geodesic distance between the points, that is using a path inside the cluster of chromosomes. In the following example: two paths (depending on the chosen pixel neighborhood) between the two blue points were overlaid on a binary image (background is set to 255 and the foreground to 0).
the cluster contourthe cluster contour

 The path were determined as follow using skimage 0.8dev:

graph.route_through_array(negbin ,p1,p2,fully_connected=False)
 or
 graph.route_through_array(negbin ,p1,p2,fully_connected=True)

 with negbin, the negative binary image and p1, p2 the two points figured in blue.

If the background (0) is set to 255 (white) in the grey scale image:

 the path (red) between the two points can be determined by minimizing the "greyscale cost", that is through the valleys between the two points is the image is viewed as a landscape.

All the paths can be determined (blue):Those paths in blue were obtained from the previous binary image:



The paths were sorted according to their cost :


 The four paths with the smallest cost are figured in red, they are "inside" the cluster. Some paths are partially "outside" the cluster and thus are not geodesic routes inside the cluster.
Now if paths are calculated from a 8 bits greyscale image where the background was set to 255 (a naïve idea supposed to prevent a path to go "outside" a cluster):

some routes between two points, go along the contour. The two first routes to with the smallest costs do correspond to touching domains between the chromosomes. However when routes with higher cost are selected:
some correspond to the cluster contour and not to contact domains between the chromosomes, because the cost (sum of pixels grey value) along the contour is probably lower than the cost of a shorter path which have to climb over a contact domain between two chromosomes.

Additional features may distinguish routes belonging to contour from those providing solution to separate chromosomes:
red route on the right resolve touching chromosomes
Minimal cost routes are still better to separate chromosomes than routes obtained with euclidian distance:
The shortest path (on the right) cuts inside the chromosome


Download code