Medha GRIN Presentation

From Semantic Portal Wiki

Jump to: navigation, search

Presentation given at CSCI 6966 Advanced Semantic Web (Fall 2008) - Lesson 8

refresh


Questions

ID Question Name Answer
GRIN Ankesh Algorithm to build GRIN Index is not clear to me.
  • How do we obtain L1 from L0?
  • How is centroid c computed in step 9?
Ankesh Khandelwal Will be explained in the talk
Medha GRIN Presentation Gregory Todd Williams 1 Reducing the potential graph data used in matching by traversing the index tree seems to rely on a sufficient number of query constraints. These constraints, however, seem to rely on the query using a number of bound subjects or objects. Would queries with few (or no) bound subjects or objects eliminate the effective use of the GRIN index? Since no details are given on the queries used in the evaluation, it's hard to know how the GRIN approach would perform on such queries (where you might be querying for the subjects and objects, using bound predicates to define the query pattern). Relatedly, these are exactly the queries where the other systems used in the evaluation might perform better than GRIN, since the increased index creation time and size of these systems are likely a result of indexing for, in part, patterns with only predicates bound. Gregory Todd Williams Yes you are right. The constraints are extracted from graph and seem to rely on at least some bound nodes in the query graph.

Your doubt about details of the queries used for evaluation is correct. There are no details given w.r.t. the standard size of query graph used in the evaluation and effectiveness of GRIN index on different queries and query graph sizes.

I believe it is fair to assume by a given query graph will have some bound nodes, as most realistic queries do have some bound nodes.

Medha GRIN Presentation Jesse Weaver Does this approach allow for variable predicates? Jesse Weaver

Yes, I guess predicate can be unbound, because an unbound predicate an be simply represented as a wildcard, but you have specify the path length.

While this is my understanding, it depends on the subgraph matching algorithm that they use after identifying portions of the original RDF graph to be searched for finding variable bindings. I have not looked at the subgraph matching algorithm and hence cannot authoritatively say whether queries without bound predicates will work or not.
Medha GRIN Presentation Joshua Shinavier 1 Apart from introducing a novel indexing mechanism for RDF, this paper also introduces an extension to SPARQL, in the form of P-path edges. However, it's not clear why P-paths are necessary, nor why the authors chose to model them as they did (as sets of predicates, as opposed to, say, sequences of predicates, regular expressions, or other formalisms used in proposed path extensions to SPARQL). They obviously play no part in the average query times which the authors use to compare GRIN with Jena, Sesame and RDFBroker, as these tools only deal with standard SPARQL queries. Why are the 20 non-standard TAP and ChefMoz queries mentioned at all? Can anything be said about GRIN's performance for these queries? The authors claim that "GRINAnswer is *often faster* than Jena, Sesame and RDFBroker for *certain types* of graph-based queries", which seems like a weak statement, especially since we're not told which queries these are. Do the advantages of GRIN have more to do with efficient evaluation of standard SPARQL queries, or with the ability to evaluate path-based queries of the variety supported by GRIN? Joshua Shinavier You question is more or less is same as the doubts raised by other people on this forum. Please refer to my answers there.
Udrea2007grin question 1 by lebo It would seem that the method used to determine cluster centers would drastically influence query performance. The selected clustering algorithm (e.g., PAM) and the inter-cluster metric (e.g., single, complete, or average link) would both be factors of performance. The authors do not commit to an inter-cluster distance metric d_c and devote one sentence discussing the results of the comparison: They all "performed" the same within 5%.
  1. Why do you suppose GRIN's index creation times, index sizes, and query times were invariant to the inter-cluster metric used?
Tim Lebo This is a good question. The answer is two fold. In GRIN index building and query execution, authors have made use of some graph mining, clustering algorithms, which I am not very familiar with. Since the features and performance characteristics of these two algorithms are not know and also there isn't a lot of literature present on usage of these algorithms for RDF graphs, it's difficult to comment on whether or not performance characteristics of these algorithms affect efficiency of GRIN index.


Attendees

Tim Lebo

Personal tools
Semantic Web Community
Tetherless World constellation
maintenance