Ankesh Sep11

From Semantic Portal Wiki

Jump to: navigation, search

CSCI 6966 Advanced Semantic Web (Fall 2008)


Questions

ID Question Name
Ankesh Sep11 Jesse Weaver The paper mentions that variables are replaced by constants from filter expressions, but depending on the system, the two forms of the query may mean different things. As an example, consider:
  SELECT ?s WHERE {
     ?s <test:p> ?o .
     FILTER(?o = "true"^^xsd:boolean)
  }

According to SPARQL, the '=' operator here checks for _boolean_ equality if ?o is bound to an xsd:boolean literal. Therefore, if ?o binds to "1"^^xsd:boolean (non-canonical but valid lexical representation), FILTER "passes". However, with variable replacement, the query would become:

  SELECT ?s WHERE {
     ?s <test:p> "true"^^xsd:boolean .
  }
Depending on the system, this may not be equivalent (although it could be). Does DARQ account for this?
Jesse Weaver
AnkeshSep11GregoryToddWilliams1 Does the cost estimate for bind joins accurately reflect the actual cost of such a join? The presented formula 3.3 seems to ignore the repeated transfer cost of R(q'2), yet no details are given about the choice of join algorithms in the benchmark executions Gregory Todd Williams
AnkeshSep11JoshuaShinavier1 What is the reason for basing "capabilities" on predicates (exclusively)? Ignoring the details of the implementation, there are other ways in which we might want to partition a distributed data set (e.g. by assigning quads to data sources based on their named graph component. In this case, a capability might consist of a number of named graphs in which the query planner can expect to find statements). Joshua Shinavier
AnkeshSep11JoshuaTaylor1 One of the motivations for distributed queries was that it is impractical to download a remote site's dataset. However, it seems that downloading numerous service descriptions would seem to cause some of the same problems, particularly since some of the service descriptions would change very often, e.g., statistical descriptions which carry the number of triples in the dataset. (Would this even be realistic for systems that don't store triples, but provide a SPARQL endpoint on top of other systems?). Joshua A. Taylor
AnkeshSep11JoshuaTaylor2 The evaluation section makes no use of systems that provide dynamic information. How would creation of service descriptions for such systems affect performance? E.g., statistical service descriptions include the number of triples in the datatset---how does computing this affect performance? Joshua A. Taylor
AnkeshSep11JoshuaTaylor3 Are some of the partitions of datasets realistic? E.g., one of the example service descriptions mentions the ability to search for names beginning with letters from A-R. Do real systems actually do things like this, and if they are, they must have some way of dealing with needing to query different data sources for similar information---how do those handle these issues? Joshua A. Taylor



Author Text
Quilitz2008querying question 1 by lebo Tim Lebo The authors state, "There is no other need for cooperation except of the support of the SPARQL protocol." Yet later, "To find the relevant information sources for the different triples in a query and to decompose the query into sub-queries the query engine needs information about the data sources." DARQ relies on the service descriptions to determine how to decompose the query.
  1. How do the service descriptions not require cooperation to obtain the capabilities <math>C_D</math> of a data source D? "We deliberately use only these simple statistics because we expect every data source to be able to provide them, or at least rough estimations."
Quilitz2008querying question 2 by lebo Tim Lebo
  1. Aren't Limitations subsumed by Constraints? That is, can't all Limitations be expressed as Constraints?
Quilitz2008querying question 3 by lebo Tim Lebo
  1. How does the subject (object) being bound effect the subject (object) selectivity <math>ssel_D(p)</math> (<math>osel_D(p)</math)?
  2. How does this value relate to it's alternative when the subject (object) is not bound?
Quilitz2008querying question 4 by lebo Tim Lebo
  1. What are the Service Descriptions for the dbpedia endpoints? This is a critical component to understanding /why/ the optimization performed better. "Our evaluations show that even with a very limited amount of statistical information it is possible to generate query plans that perform relatively well." What statistical information was used?
Quilitz2008querying question 5 by lebo Tim Lebo It is not clear how the dbpedia data were split across the two servers.
  1. How were they split? For example, Q1 used sources S4 and S5. If both S4 and S5 were on the first server, the second was idle.
Quilitz2008querying question 6 by lebo Tim Lebo
  1. Why are their /five/ logical endpoints on two servers?
  2. How does the setup of two Sun-Fire-880 machines "make sure that the endpoints are not a bottleneck"?
Quilitz2008querying question 7 by lebo Tim Lebo
  1. What is the computational complexity of the planning and optimization? Execution timings can be a misleading indicator of the efficiency.


Absentees

Tim Lebo

Personal tools
Semantic Web Community
Tetherless World constellation
maintenance