[Mcl-users] large graphs and mcl

Stijn van Dongen stijn at ebi.ac.uk
Thu Jul 8 11:21:48 BST 2010


Recent experiences in Ensembl Compara indicate that the latest version of MCL
(10-148) has improved with respect to memory management.

Background:
   Ensembl family graphs are now around 2.5M nodes and 500M edges.

Symptoms:
   MCL exits with an out-of-memory error, while memory does not
   seem exhausted. Giving MCL more memory (with LSF) does not help.
   As a result of changes made to mcl-10-148 mcl now completes for such a graph
   in about 4 hours using 4 CPUs. We are not yet clear about memory usage; it
   seems somewhere inbetween 20 and 30G. The input graph size is about 8G.
   Of course, time usage depends on the hardware used.

Analysis:
   MCL uses a memory model where malloc, free and realloc are used many times,
   and many C objects come into existence and disappear during an MCL run.
      This should all be fine. MCL is tested with valgrind a lot. It releases
   all memory that it should, has no leaks, and no errors.  This does not
   preclude all MCL-side coding errors as valgrind is necessarily specific
   to code-paths taken and data put in, but it does give confidence.
   It is not possible to run the failing cases under valgrind, as the size
   of the data and the time/space overhead incurred by valgrind preclude this.
      An MCL-specific pattern is that large stretches of memory are first
   malloced, than later realloced to a much smaller chunk. Say from 100K to
   1K, happening more than 500K times.

Change
   The following change was made in 10-148. If the realloc target amount
   of memory is less than half of the originally malloced (or realloced)
   amount, it now frees the large chunk and mallocs an entirely new chunk. All
   this is done with wrappers around C malloc and realloc of course.

Conclusion I
   It is unclear why this change helps. Code paths should not change as far as
   I am aware, and the data we input is the same.  A possibility is that the
   malloc implementation on our system has issues. It is extremely unusual for
   system libaries to have bugs in them however, as they are widely used and
   bugs tend to be noticed.  Conceivably the large-memory characteristics of
   these computations combined with MCL's rapid churn of memory chunks make
   this a corner of compute-space not often visited.
   Debugging this is hard because of the size of data involved. The current
   status is that a single patch (wrapping realloc as described above, only
   when invoked for one specific user-defined type, a vector) solves the
   issue without changing code-paths, so for now I am leaving it at this.

Conclusion II
   If you have very large graphs, better use mcl-10-148.


-- 
Stijn van Dongen         >8<        -o)   O<  forename pronunciation: [Stan]
EMBL-EBI                            /\\   Tel: +44-(0)1223-492675
Hinxton, Cambridge, CB10 1SD, UK   _\_/   http://micans.org/stijn


More information about the Mcl-users mailing list