SPARQL BGP opt Medha Q 1

From Semantic Portal Wiki

Jump to: navigation, search
  • Question is for the Presentation: Gregory Todd Williams SPARQL BGP Optimization Presentation
  • Question is asked by: Medha Atre
  • The Question is: What is the size of the typical join pattern graph used for evaluation of the system? When authors mention that query 9 could not be completed for plan space evaluation, audience is curious to know how it looked. (See more comments).
  • Answer: The large BGPs in the LUBM benchmark (queries 2 and 9) contain six triple patterns. Query 9 was answerable using the system, but the evaluation of the entire plan space for query 9 (all 720 query plans) was not. This is perhaps surprising since query 2 has the exact same query topology and its evaluation completed. However, both queries have a worst-case plan that involves two cartesian products, resulting in intermediate query results with an upper bound of ~140 trillion (based on my rough napkin math). I agree that the dataset is much too small for an analysis performed in 2008. However, the sample query execution times listed for the optimized plans are on the order of 10-100ms, not a few seconds.

In general, I thoroughly liked the core idea of the paper. Although here are some points which struck me in their evaluation:

  • The dataset used (156k triples) is unbelievably tiny and given that the paper was published in 2008, there are much larger datasets available out there.
  • Given this tiny dataset the times reported after query execution with optimization techniques are still very high (of the order of few seconds for some queries) and hence it is important to report the size of the query graph and the number of joined triple patterns in the query.
Facts about SPARQL BGP opt Medha Q 1RDF feed
AnswerThe large BGPs in the LUBM benchmark (quer The large BGPs in the LUBM benchmark (queries 2 and 9) contain six triple patterns. Query 9 was answerable using the system, but the evaluation of the entire plan space for query 9 (all 720 query plans) was not. This is perhaps surprising since query 2 has the exact same query topology and its evaluation completed. However, both queries have a worst-case plan that involves two cartesian products, resulting in intermediate query results with an upper bound of ~140 trillion (based on my rough napkin math). I agree that the dataset is much too small for an analysis performed in 2008. However, the sample query execution times listed for the optimized plans are on the order of 10-100ms, not a few seconds. the order of 10-100ms, not a few seconds.
Question askedWhat is the size of the typical join pattern graph used for evaluation of the system? When authors mention that query 9 could not be completed for plan space evaluation, audience is curious to know how it looked. (See more comments).
Question asked byMedha Atre  +
Question for the PresentationGregory Todd Williams SPARQL BGP Optimization Presentation  +
Personal tools
Semantic Web Community
Tetherless World constellation
maintenance