## Supercomputing 2010

In this blog post, I wish to briefly summarize the aspects of the Supercomputing 2010 conference that may be of use to the semantic web community.

**1. Technical Papers on Graph Algorithms**

The first paper was *Fast PGAS Implementation of Distributed Graph Algorithms* by Cong, Almasi, and Saraswat. PGAS stands for Partitioned Global Address Space. They claim to “present the first fast PGAS implementation of graph algorithms for the connected components and minimum spanning tree problems” [1]. They performed an evaluation on two random graphs each with 100 million vertices, one with 400 million edges and the other with one billion edges.

The second paper was *Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory* by Pearce, Gokhale, and Amato. This was the only paper of the three that included as part of its motivation something about the web. They mention “WWW graphs” as an example of real-world, large graphs, and by “WWW graph” they mean webpages as nodes and links between webpages as edges. Also, social networks were mentioned as another example of real-world, large graphs. Unlike the first paper, this paper focuses on a shared-memory paradigm. They “present a novel asynchronous approach to compute Breadth-First-Search (BFS), Single-Source-Shortest-Paths, and Connected Components for large graphs in shared memory” [2]. The “semi-external memory” refers to the use of solid-state memory devices. They evaluated the approach on synthetic, “scale-free graphs” generated using the RMAT [3] graph generator. The graphs had roughly between 33 million and one billion unique edges. They also evaluated Connected Components on WWW graphs ranging from roughly one billion to eight billion edges.

The third paper was *Scalable Graph Exploration on Multicore Processors* by Agarwal et al. They “investigate the challenges involved in exploring very large graph by designing a breadth-first search (BFS) algorithm for advanced multi-core processors …” [4]. They evaluate on RMAT [3] graphs of up to a billion edges. They make some very interesting claims that in some cases, their BFS algorithm running on a 4-socket Nehalem EX performs on par or even exceeds previously reported performance on the Cray XMT, Cray MTA-2, and Blue Gene/L.

The problem is clear: graph algorithms require random data access (in the sense of spacial locality of the bytes) which inhibits performance. The common solution is to have some abstraction of shared memory (whether it is actually shared memory or not). The focus is on conventional graph algorithms like BFS and connected components, and the state-of-the-art seems to handle on the order of billions of edges.

**2. The Graph 500 BOF**

The Graph 500 BOF was also of interest, although it was really the Graph 9. The effort was motivated by dissatisfaction that the Top 500 list uses LINPACK as its benchmark, and FLOPS are not the best indication of the usefulness of a system for attacking graph-related problems. For this first version of Graph 500, BFS was the benchmark, and systems were ranked first by size of the input graph and then by edge traversals per second. The list was dominated by Cray machines with a Cray XT4 coming in second place with 2^{32} vertices and 5.22 billion edge traversals per second. However, the one BlueGene machine, a BlueGene/P, took first place achieving both the largest graph of 2^{36} vertices and the highest edge traversal rate of 6.6 billion edge traversals per second.

**3. The Semantic Graph/Database Processing BOF**

Finally, there was the semantic graph/database processing BOF hosted by David Haglin of PNNL. Cliff Joslyn of PNNL presented their submission to the 2010 Billion Triple Challenge, and David Mizell of Cray, Inc. presented his work in SPARQL querying on the Cray XMT. There was also a presentation by Franz Inc. about AllegroGraph. I presented my relevant existing research as well as gave mention of related works. I also gave a few suggestions to interested HPC researchers, of which I will discuss more in another blog post.

One thing that was clear is that there was diverse interest in semantic web at the BOF. Some were interested in merely utilizing RDF for representing graphs while others were interesting in supporting analysis and/or query of such RDF graphs. I ran into only one person who seemed interested in reasoning. Another comment was about how SPARQL does not directly support graph-like operations or algorithms but serves primarily as a data access language. As an example, how would one find clustered components using SPARQL? It could be done, but perhaps would be cumbersome. The supercomputing community seems to handle RDF data primarily as a graph data structure rather than as sets of triples.

Jesse Weaver

Ph.D. Student, Patroon Fellow

Tetherless World Constellation

Rensselaer Polytechnic Institute

[1] Cong, Almasi, Saraswat. Fast PGAS Implementations of Distribute Graph Algorithms. SC 2010.

[2] Pearce, Gokhale, Amato. Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory. SC 2010.

[3] Chakrabarti, Zhan, Faloutsos. R-MAT: A recursive model for graph mining. ICDM 2004.

[4] Agarwal, Petrini, Pasetto, Bader. Scalable Graph Exploration on Multicore Processors. SC 2010.

Another aspect to the Graph 500: The distributed memory implementations, even unoptimized, required an extreme effort. The winner required multiple people working for a solid month. The three Cray XMT entries? One used my intentionally pessimized reference code, and the other two “optimized” versions each required one person’s afternoon. It’s crazy. A single-threaded section on the XMT effectively runs at 12.5MHz, but the system as a whole is wonderful for graph operations.

For “real problems,” the priorities are how much RAM you have, then how well you tolerate latencies accessing that memory, and then everything else. That also is the point behind the Agarwal, et al. paper from (more or less) our group.

5(0 votes cast)0(from 0 votes)