I recently presented at the Semantic Graph Database Processing BOF at SC2011, and I had the opportunity to discuss with others the needs for high-performance computing in web-scale computation and the benefits of Linked Data and ontologies on the World Wide Web. There was one participant there who was adamantly opposed to the semantic web. (I think his exact quotes outside of the presentation were something like “I do not believe in the semantic web” and “only the semantic web cares about the semantic web”). As I tried to make my case with him, it became increasingly clear to me that this person had a few misconceptions about the semantic web. I want to address those misconceptions here.
Before I continue, though, allow me to disclaim a bit. I am not a representative of the entire semantic web community, although I do consider myself a member of it. Additionally, I am not officially associated with the W3C. I write this blog entry simply in the capacity of a semantic web enthusiast (henceforth, semwebber), and not even as a member of the Tetherless World Constellation. I invite, nay, urge other semwebbers to contribute comments to this blog post in any capacity (agree, disagree, amend, etc.).
1. “One ontology to rule them all”
To my knowledge, nobody has ever claimed that there should be “one ontology to rule them all.” Instead, what is regularly promoted is ontology reuse and/or integration. For example, the FOAF ontology is widely used in the semantic web to describe persons; why create your own ontology when you can reuse a well-established one? Integration of ontologies allows for conciliation of perspectives, causing data that use these ontologies to become meaningfully related. Admittedly, there are some rather large, comprehensive ontologies out there, and there are some very popular and pervasive ones, too. However, there is no standard or recommendation that requires publishers of RDF data to comply with any particular ontology. You could even ignore the RDF vocabulary if you so please (yes, even rdf:type).
The primary purpose of an ontology (in my view) is to attach explicit semantics to your data. Just as the participant had stated (although he meant it in contrast to the semantic web), there are many ontologies. They compete in the ecosystem of the World Wide Web and evolve accordingly (or become extinct).
2. “Triples all the way down”
(First, let me say, this is not an affront to Planet RDF.)
This is a bit of a pet peeve of mine, and perhaps what I say here will offend some semwebbers (I hope not). The semantic web (in my view) is not about “triples all the way down.” What do I mean by that? Let me explain.
RDF brings primarily two things to the table when it comes to publishing and integrating data on the web: names in the form of URIs, and a simple data model that is flexible enough for (arguably) nearly any kind of data. (I would like to add a third, meaningful links, but I will avoid that for now.) So when data is published to the web, publishing it as RDF allows you: (1) to identify the things in your data across the World Wide Web, and (2) to structurally (and possibly semantically) integrate your data with other data on the World Wide Web. (I emphasize “World Wide” here to bring to attention the vast scope of publication, identification, and integration that is being achieved.) Fantastic.
Does this mean that everything can be efficiently (or rather, ideally) represented in RDF? No. Then why would you ever want to handle triples? You probably don’t. Let me explain.
RDF is meant to solve the problem of meaningfully publishing data (not just documents) on the World Wide Web. Beyond that, do what you want. More specifically, when you crawl and/or aggregate data from the World Wide Web, you don’t have to keep the RDF data as triples in your system. It is no longer on the global stage of the World Wide Web; rather, it is now in your system where you are king. So optimize away! Store it or process it however you like! Relational databases? Sure! Rewrite URIs as shorter terms? Whatever floats your boat! Ignore the explicit semantics and treat it like an unlabeled graph? I wouldn’t recommend it, but you’re the king! Do whatever it takes to meet your use case, and if your use case has something to do with RDF data, then fine, leave it as triples if you want. My point is, it’s not necessarily “RDF all the way down,” but it is “RDF at the top” where “top” is the place of publication, the World Wide Web. The universal naming mechanism of URIs and the generic data model enables data publishers to get data out there in a way that can be explicitly understood by machines (for example, when I say “Beast is furry,” am I talking about Mark Zuckerberg’s dog or the fictional X-Man Dr. Henry Philip “Hank” McCoy?), but as the creator of that machine, it’s up to you how to utilize those explicit semantics.
(They both look furry to me.)
To be clear, though, I am promoting RDF as a way to publish structured, semantic data as opposed to not publishing structured, semantic data. In the future, it is conceivable that there may exist other good ways to publish structured, semantic data, but RDF exists today and is widely used.
So I will leave it at that. Again, I invite comments, rebuttals, accolades, disparagements, etc.