The following is email that I sent out today with respect to the Semantic Web Challenge at this year’s ISWC. If you are interested in this and have not yet joined the group firstname.lastname@example.org then let me encourage you to do so — but I’d also welcome email (or blog comments, although they weren’t working right last time I posted from here) if you have any throughts — in the next week or so Peter Mike and I need to move this from random thoughts to something starting to resemble competition rules!
p.s. Oh yeah, I forgot, if you are missing the context, Peter Mika and I are cochairing the ISWC 2008 Semantic Web Challenge …
(Email sent to email@example.com):
All- Peter feels that we now have the collection and distribution of the triples underway, which means he gets to make me do some work finally… My role at the moment is to figure out what we would like to make the challenge part of the challenge be. Here are some thoughts, I welcome feedback:
We see four, very non-disjoint audiences for the challenge (in fact, Peter, me, and most of the people on this list are in at least several categories): Triple store developers, linked data technology developers, Semantic Web researchers interested in scalable reasoning, ontology-based research groups
Here are some of my thoughts with respect to these
A – Triple Store Developers We do not want this to be a “triple store shootout” in the sense of who can process a query fastest or such. We don’t see that competition as being all that useful at a time when people are still very much in development mode. Rather, we would like the outcome of this event to be a realization in the outside world that triple-stores can and do handle these sorts of numbers (the DB folks still say “triple stores break at a million triples” at conferences I go to – I have no idea where they get that, but let’s push it up a few orders of magnitude!!) So at the moment my thinking on this area is that we would like to give you folks bragging rights for being able to support systems other people develop (i.e. any of you who host this data and make it available via SPARQL should be listed as “winners” in some way) I also think that if some interesting, large, and complex SPARQL queries are developed against this dataset (say including filters and optionals), then those would become useful benchmarks, so we would like to find a way to encourage the sharing of these (maybe for a future date when a benchmarking shootout would be more appropriate)
B – Linked data technology developers: We write a lot about the Semantic Web as being the Web of linked data, but to date, in practice, most of that data is either within an enterprise or locked in a particular application. We are purposely designing this dataset to be very heterogeneous, but with many connections between pieces, so it should be a great dataset for showing off tools that can exploit the dataweb. In this area we are thinking of having some goals like “visualize (or browse) the dataweb”, Datamining of this sort of data, etc. — seems to us this is a ripe area for a challenge
C – SW researchers interested in scalable reasoning: The data set we are developing will include a (large) number of triples tied to FOAF, DOAP and other “small o” ontologies. We also have a lot of data that will be made available that was crawled from microformats (where the “semantics” are well specified). This is thus an ideal proving grounds for the “little semantics goes a long way” philosophy, and thus this also seems like an appropriate challenge area
D – Ontology research Big A-Box, you got it! Show us something.
So, I think we will have the “competition” be fairly unspecified – we will identify several areas of interest from the above and work out how to tie that into an “announcible” competition.
I welcome, NEED, your feedback on this -Jim H.