As mentioned in my last blog post, I recently participated in a birds-of-a-feather (BOF) on semantic graph/database processing at Supercomputing 2010 (SC10). My general research interest is in high-performance computing (HPC) for the semantic web, so this BOF was a great fit. At the BOF, I very briefly made three suggestions to HPC researchers; in this blog post, I expand on and explain these suggestions. I welcome feedback, particularly from those in the semantic web community who have something to share with the supercomputing community.
1. There is a need for good benchmarks from a HPC perspective.
By “good,” I primarily mean that the datasets and queries need to be realistic. In other words, the data should reflect data that occurs in the real world, and queries should reflect queries that would be posed by actual users or systems. By “HPC perspective,” I mean that it needs to test strong scaling (change in time for fixed total dataset size and varying number of processors) and weak scaling (change in time for fixed dataset size per processor and varying number of processors).
The Lehigh University Benchmark (LUBM)  has arguably been the most widely used benchmark likely because it is one of the earliest benchmarks that provide a data generator and a standard set of queries. It is targeted towards inferencing. However, LUBM datasets are not only synthetic, but they are quite unrealistic. In addition to uniform distribution of data, it suffers from other inadequacies like few links between universities and the use of a single, nonsensical phone number for every person (“xxx-xxx-xxxx”). Therefore, LUBM datasets do not provide a realistic data distribution and thus cannot test the ability of systems to handle realistic selectivity and skew.
There is also the Berlin SPARQL Benchmark (BSBM) , but it is “built around an e-commerce use case” and “illustrates the search and navigation pattern of a consumer looking for a product” . From a HPC perspective, we will likely be more concerned with overall run-time of queries or reasoning processes (or whatever other interesting processes) rather than handling interaction with users.
Finally, there is SP2Bench . This is perhaps the most useful benchmark for SPARQL benchmarking. It provides a data generator that mimics statistical properties of DBLP data, and it provides a set of sensible queries. Therefore, the dataset is more realistic than LUBM, and it is focused on SPARQL query (whereas LUBM focuses on reasoning).
However, there is still a need for a good reasoning benchmark from a HPC perspective. It’s difficult to be more specific than that because providing such a benchmark is still very much an open research topic. Clearly there needs to be an ontology that uses features from various reasoning standards (e.g., RDFS, OWL) and a corresponding data generator. There should also be some way to verify validity of inferences based on certain entailments. Again, this is very much an open research topic which is why I made the suggestion but have few answers myself.
2. Consider existing reasoning standards as starting points.
This may be the more controversial of my suggestions, but there is good reason for it. Recent history indicates that the reasoning standards continue to iteratively evolve based on the needs of the community.
Consider RDFS (by which I mean RDFS entailment as defined in RDF Semantics). First of all, it is technically undecidable , but in a way that is trivial and easily overcome. Secondly, few systems (in my experience) completely support inferences based on literal generalization, XML literals, and container-membership properties. Other rules, like “everything is a resource,” are generally trivial and uninteresting. More commonly, implementations align with a fragment of RDFS that I call RDFS Muñoz  (originally termed the ρdf fragment), which essentially boils down to domains, ranges, subclasses, and subproperties. Perhaps Muñoz said it best:
“Efficient processing of any kind of data relies on a compromise between the size of the data and the expressiveness of the language describing it. As we already pointed out, in the RDF case the size of the data to be processed will be enormous, as current developments show …. Hence, a program to make RDF processing scalable has to consider necessarily the compromise between complexity and expressiveness. Such a program amounts essentially to look for fragments of RDF with good behavior with respect to complexity of processing.” 
Consider also OWL 1. How many scalable systems completely support one of the OWL 1 fragments (Lite, DL, Full)? I cannot say for sure, but my impression from experience and feedback from others is that the cost for higher expressivity can often be too expensive in terms of performance, especially as you scale dataset size. Perhaps it is for this reason that OWL Horst  (originally termed the pD* fragment) has gained popularity as (arguably) the most widely supported OWL fragment.
Now there is OWL 2. OWL 2 RL (a fragment of OWL 2) is “inspired by description logic programs and pD* [OWL Horst]” . The SAOR paper from ISWC 2010  has already shown a subset of OWL 2 RL rules for which closure can be efficiently produced in parallel.
So my point is this. Reasoning standards capture well-defined and understood fragments, but research and practice continue to explore subfragments that are suitable for certain problems, and as the subfragments become stable and gain popularity, they inspire future standards. It is an iterative process, so it is not necessary to become obsessed with fully complying with existing standards (unless that is actually necessary to meet your use case). It is probably more interesting to search for fragments of the standards that fit certain HPC paradigms.
3. Review the literature to reconsider approaches that were once considered less viable.
This suggestion seems obvious. As an example, I recently did a literature review of parallel join processing, and one thing I noticed is that a majority of the literature is focused on shared-nothing architectures. In 1992, DeWitt and Gray stated:
“A consensus on parallel and distributed database system architecture has emerged. This architecture is based on a shared-nothing hardware design ….” 
However, in 1996, Norman, Zurek, and Thanisch directly opposed (or reversed) the claim of DeWitt and Gray saying:
“We argue that shared-nothingness is no longer the consensus hardware architecture and that hardware resource sharing is a poor basis for categorising parallel DBMS software architectures if one wishes to compare the performance characteristics of parallel DBMS products.” 
The popularity of the shared-nothing paradigm was probably further fueled by the advent of inexpensive supercomputing by way of Beowulf clusters and Networks of Workstations (around the mid 90’s). However, many modern supercomputers provide shared-disk and shared-memory paradigms. The Blue Gene/L in our Computational Center for Nanotechnology Innovations (CCNI) is networked with a General Parallel File System (GPFS). Making use of GPFS, the Blue Gene/L could be considered shared-disk in a programmatic sense. The Cray XMT uses large shared-memory. Rahm points out that a major advantage of shared-disk is its potential for truly dynamic load-balancing , so lets look back at some of the shared-disk and shared-memory research that has been done [12-15].
All of that just to say, a review of literature is in order. Potential sources of inspiration include parallel databases, parallel graph algorithms, deductive databases, and graph databases.
Ph.D. Student, Patroon Fellow
Tetherless World Constellation
Rensselaer Polytechnic Institute
 Guo, Pan, Heflin. LUBM: A benchmark for OWL knowledge base systems. JWS 2005.
 Bizer, Schultz. The Berlin SPARQL Benchmark. IJSWIS 2009.
 Schmidt, Hornung, Lausen, Pinkel. SP2Bench: A SPARQL Performance Benchmark. ICDE 2009.
 Weaver. Redefining the RDFS Closure to be Decidable. RDF Next Steps 2010.
 Muñoz, Pérez, Gutierrez. Simple and Efficient Minimal RDFS. JWS 2009.
 ter Horst. Completeness, decidability and complexity of entailment for RDF Schema and a semantic extensions involving the OWL vocabulary. JWS 2005.
 Hogan, Pan, Polleres, Decker. SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples. ISWC 2010.
 DeWitt, Gray. Parallel Database Systems: The Future of High Performance Database Systems. Communications of the ACM 1992.
 Norman, Zurek, Thanisch. Much Ado About Shared-Nothing. SIGMOD Record 1996.
 Rahm. Parallel Query Processing in Shared Disk Database Systems. SIGMOD Record 1993.
 Lu, Tan. Dynamic and Load-balanced Task-Oriented Database Query Processing in Parallel Systems. EDBT 1992.
 Märtens. Skew-Insensitive Join Processing in Shared-Disk Database Systems. IADT 1998.
 Moon, On, Cho. Performance of Dynamic Load Balanced Join Algorithms in Shared Disk Parallel Database Systems. Workshop on Future Trends of Distributed Computing Systems 1999.