[Mcl-users] Strange result from mcl in label mode

Stijn van Dongen stijn at ebi.ac.uk
Thu Jan 21 10:56:34 GMT 2010

            Hi Sungwon Jung,

On Wed, Jan 20, 2010 at 11:15:35PM -0700, Sungwon Jung wrote:
> Hello.
> I was trying to use the MCL implementation in C, and getting strange
> clustering result.
> The input file I used contained 2,358,687 directed edges in ABC-format.
> I ran mcl without any other options, as follows:
> mcl <input file> --abc -o <output file name>
> And got a result of 720 lines (720 clusters, from my understanding).

That is correct. Incidentally, mcl will always convert directed edges to
undirected edges. It does not make much sense on directed graphs.  If you
insist on feeding it directed graphs, you'll have to use mcxload.

> By the way, when I checked some of the clusters, I found some cases such
> that a vertex is being assigned to a cluster, while it did not have any edge
> with other vertices in the same cluster from the original input file.
> For example, one of the clusters had following vertices:
> v1  v2  v3
> while v1 had no connected edge with either v2 or v3 from the original input
> file.

Just to make sure; are there connections coming from v2 or v3 to v1?
In that case the explanation is given by my statements above.

For all what follows I am assuming that your input did *NOT* contain edges
going from v3 or v2 to v1.  In that case, it is indeed surprising and
undesirable but ..

> >From my understanding of MCL, any vertex in a cluster should be connected
> with at least one of the vertices in the same cluster.
> Maybe I have wrong understanding but I am not getting how such cluster
> assignment is possible?

.. it is conceivable if the graph has very strong bipartite characteristics.

Some remarks/questions:
-  What you observe is possible but happens rarely.
-  Does your graph locally have such bipartite structure around v1 v2 v3?
-  Does it occur for more than one node? How often?
-  What are the node degrees of v1 v2 and v3?

First, you might be able to combat this by increasing the loop weights,
for example by trying adding -c 2, or -c 3 etc to the command line.
However, this is a bit of a stopgap solution. It would be nicer
to understand the characteristics of your input graph.

Second, you can add the option --force-connected=y.
This should analyse the graph+clustering and make sure the resulting
clusters induce connected subgraphs.
This is also a way to find out how many of your clusters are
affected by this phenomenon.

Finally, I'll repeat myself to make sure the message does not get
lost among the lenghty explanation above (which I wrote before
I considered the scenario below):

If your input contained edges (usually called arcs if directed)
from v3 or v2 to v1 then the v1 v2 v3 cluster is explained
by MCL's behaviour to make all edges undirected.

The reason MCL makes edges undirected is that I have never been
able to think of an example graph where directionality was
informative for cluster structure if at the same time
edge density should be informative for cluster structure.


Stijn van Dongen         >8<        -o)   O<  forename pronunciation: [Stan]
EMBL-EBI                            /\\   Tel: +44-(0)1223-492675
Hinxton, Cambridge, CB10 1SD, UK   _\_/   http://micans.org/stijn

More information about the Mcl-users mailing list