Performance Comparison of Graph Mining Algorithms onPTE

Min. Support % 2% 3% 4% 5% 6% 7% 8% 9% 10% 20% 30%
Min. Support Absolute 7 10 14 17 20 24 27 31 34 68 102
Number of graphs 136949 22758 5935 3608 2326 1770 1323 977 844 190 68
Number of free trees 119378 20481 5514 3376 2172 1644 1230 909 779 177 62
Gaston (No-Nauty)8 7.9s 1.7s 0.6s 0.4s 0.3s 0.3s 0.2s
9.1MB 4.4MB 3.4MB 3.0MB 2.7MB 2.1MB 1.9MB
Gaston (No-Nauty/Diffset)8 20.0s 4.2s 1.5s 0.9s
4.4MB 1.8MB 1.5MB 1.4MB
Gaston (Lowmem)8 39.6s 8.5s 2.7s 1.6s 1.0s 0.8s 0.6s
1.5MB 1.3MB 1.3MB 1.3MB 1.3MB 1.3MB 1.3MB
FFSM9 30.3s 6.2s 2.0s 1.2s 0.7s 0.5s 0.4s 0.3s 0.2s 0.1s 0.1s
8.2MB 4.1MB 3.7MB 3.2MB 3.2MB 2.8MB 2.8MB
gSpan4 98.0s 20.3s 6.3s 3.4s 2.0s 1.4s 1.0s 0.8s 0.6s 0.3s 0.2s
3.8MB 2.8MB 2.8MB 2.8MB 2.8MB 2.8MB 2.8MB
AcGM10 105.8s 21.1s 6.3s 3.9s 2.8s 2.2s 1.8s 1.6s 1.4s 0.7s 0.5s
33.9MB 5.3MB 2.2MB 1.6MB 1.3MB 1.2MB 1.1MB
FSG7 307.4s 43.9s 11.0s 6.3s 4.0s 2.9s 2.4s 1.8s 1.6s 0.6s 0.3s
123.5MB 25.8MB 25.8MB 25.8MB 25.8MB 25.8MB 25.8MB
Gaston (Free Trees)8 8.1s 1.9s 0.8s 0.5s 0.4s 0.3s 0.3s 0.3s 0.2s 0.2s 0.1s
Gaston (No Isomorphism)8 10.0s 2.4s 0.9s 0.6s 0.4s 0.3s 0.3s 0.3s 0.3s 0.2s 0.1s
Gaston (Nauty)8 14.2s 2.7s 0.9s 0.6s 0.4s 0.4s 0.3s 0.3s 0.3s 0.2s 0.2s
16.2MB 5.4MB 3.9MB 3.4MB 3.1MB 2.4MB 2.3MB
Farmer Now5 - 572s 172s 93s 54s 37s 28s 20s 17s 6s 4s
- 524s 162s 88s 50s 34s 25s 18s 15s 5s 4s
Farmer ECML/PKDD5 - 794s 235s 125s 72s 48s 36s 25s 21s - -
Warmr ilProlog 1.1.146 - - - - - - - - 11351s 733s 172s
25s 5s 2s

Notes:

  1. The experiments were performed on an AMD XP1600+ with 256MB memory, running the Mandrake 9.1 Linux OS. The original algorithms, as provided by the authors, were used. The execution times are averages. The number of experiments depends on the variance of the runtimes. For experiments with a high variance more experiments were performed. All algorithms load the dataset in main memory; the time for loading the dataset is not included in the runtimes.
  2. Algorithms missing in this overview: AGM, the Freiburg implementation of Warmr and PolyFarm (the documentation of this algorithm suggests however that a canonicalization step is missing and Object Identity is not supported, which makes it impossible to simulate the graph mining setup).
  3. The PTE chemical dataset can be obtained here, or here in Prolog format.
  4. gSpan is a specialized graph mining algorithm. More information about gSpan can be found here. This algorithm was compiled for 256 nodes per graph.
  5. Farmer is a general frequent query mining algorithm which can also mine graphs; it misses support for intensional background knowledge. Farmer does not have a traditional iterative two phase approach. It is very difficult to determine the computation time spent in the generating and counting phases separately. In the second row, an approximate processor time for the counting phases shown. This time was computed using the GNU Profiler, summing the processor load of the evaluate C++ function, and multiplying the load with the run time in the first row. Farmer Now features a bettter implementation of "once optimizations" than Farmer ECML/PKDD; the support for primary keys and Weak Object Identity is now fully incorporated. More information about Farmer can be found here.
  6. Warmr is a general frequent query mining algorithm with support for intensional background knowledge. It uses a traditional generate-first, count-next approach. In the second row the processor time spent in the counting phases, as reported by the algorithm, is shown; clearly, the performance of Warmr could be improved enormously by optimizing the generating phase further. More information about Warmr can be found here.
  7. FSG is a specialized graph mining algorithm. We used the binary available from this website.
  8. Gaston is an (as yet) unpublished algorithm that searches for frequent graphs in graph databases. For more information, contact the author of the algorithm (who happens to be the maintainer of this webpage too). This algorithm was compiled for 256 nodes and 2^32 transactions.
  9. FFSM is a graph miner, more details can be found here. This algorithm was compiled for maximally 32k transactions with maximally 32k nodes. As a side remark, Gaston (No-Nauty) requires 9.2M main memory if compiled with these settings.
  10. AcGM is a graph miner, more details can be found here.