Alvaro Graves SemRank

From Semantic Portal Wiki

Jump to: navigation, search

Presentation given at CSCI 6966 Advanced Semantic Web (Fall 2008) - Lesson 6

refresh


Questions

ID Question Name Answer
Alvaro Graves SemRank Gregory Todd Williams 1 The paper presents three different semantic associations: ρ-Path, ρ-Iso, and ρ-Join. Few details are given about the ρ-Iso Association, but the description in section 2 describes it: "The paths p = p11, p12 ... p1n originating from r1 and p' = p11', p12' ... p1n' originating from r2 are semantically similar in that the corresponding edges in both paths are related in a subproperty relationship, therefore r1 and r2 are related by virtue of this similarity." This definition seems overly restrictive and simplistic to me. Can the presented ranking system be extended to allow for similar associations based on a tree or DAG of properties (not just a single path) from r1 and r2? Alternatively, could the ρ-Iso Association be extended to distinguish the similarity of r1 and r2 based on the similarity of rm and rn (the endpoints of the paths p and p')? Gregory Todd Williams
Alvaro Graves SemRank Jesse Weaver At the end of section 3.1, I(ps) is defined as the sum of the specificity and the theta-specificity of path ps. This definition equally weights the two specificities; do you think this is fair? Also, at the end of section 3.4, SEMRANK(SA) is defined as equally weighted product of the information gained, the refractions (plus one), and the semantic match of keywords (plus one). Do you think equal weighting is fair? For example, maybe the semantic match of keywords deserves a heavier weight because it is affected by the user and perhaps therefore could be considered a better indicator of what would be an appropriate rank for the user. What do you think? Jesse Weaver
Alvaro Graves SemRank Joshua Shinavier 1 Given that SemRank has exponential time complexity and cannot be evaluated for larger data sets, what is an empirical evaluation of the top-K algorithm likely to reveal? The authors expect that it will produce an ordering that is reasonably close to SemRank, but they go on to mention that the refraction count can only be computed at the end of the path building phase (so, the refraction count of a child path does not predict the refraction count of a parent path). Did I read this correctly? This seems important, because the top-K approximation would be critical to any real-world application of the presented technique -- SemRank alone is clearly not suitable for discovery of associations in a large graph. Joshua Shinavier
Alvaro Graves SemRank Joshua Taylor 1 in 5 EMPIRICAL EVALUATION we read, "There are an increasing number of publicly available RDF data sets from those that are narrowly focused (e.g. DBLP, ODP) to those with broader scope covering multiple domains (e.g. TAP, SWETO). However, most of these presented limitations that made them unsuitable as evaluation testbeds for SemRank. … Consequently, the evaluation of SemRank discussed here was done on synthetically generated data. The data generation was guided by rules to ensure that data distributions mirror the real world." The authors state that they believe that "testbeds for the Semantic Web" are in "early developmental stages". Nonetheless, it seems like they are claiming that the real world data available to them does not mirror the real world, and so come up with their own model of the real world. How accurate is their perception that the current datasets do not accurately reflect the real world, and how realistic is their expectation that the structure will evolve significantly? More importantly, how does this affect their evaluation. Have they simply developed data sets on which SemRank is promising, or have they shown that SemRank is ready for the World of Tomorrow? Joshua A. Taylor
Alvaro Graves SemRank Shangguan Computation Complexity Issues. Please note that this question may be outside of the scope of this paper (possibly due to space limit. I'm a little interested in the computation complexity of the method proposed in this paper. As we know, response time will be the key to any search engines, and thus the key to ranking algorithms. But it seems to me that the author did not include this part in the paper -the only empirical results presented in this paper were used to prove that the SemRank method works as expected. And also, given the complexity of SemRank, it will be possible for us to figure out whether it scales well or not. Zhenning Shangguan
Alvaro Graves SemRank Shangguan 2 (MAY BE DIGRESSIVE) After reading all through the paper, I was thinking what if the RDF docs are updated to some extent, e.g., adding only a few number of properties? As far as I can guess after reading, the system has to recalculate all the metrics included in the paper. But, is there any better way, such as incremental computing, to improve performance? Zhenning Shangguan
Anyanwu2005semrank question 1 by lebo The authors propose a few interesting metrics for ranking a set of semantic paths. A semantic path is an instance of a Semantic Association, which is a sequence of (RDF) properties. In the Background/Motivation section, the authors illustrate and define three example Property Sequences: rho-pathAssociation, rho-joinAssociation, and rho-isoAssociation.
  1. Are there compelling reasons for recognizing these types of semantic paths, i.e., are these structures applicable in real-world analysis?
  2. Are there other Property Sequences that are typical in the Semantic Association research area?
  3. Although these examples are helpful in understanding the purpose of analyzing semantic paths, it is unclear how these identified three relate to the metrics introduced in the remaining sections of the paper. Is there a deeper relationship between these three Property Sequences and the metrics proposed?
Tim Lebo
Anyanwu2005semrank question 2 by lebo The information content I(ps) is the addition of three terms: a ROC-aware min, a ROC-aware avg, and a non-ROC-aware max. The refraction count RC(PS) relies on the Semantic Summary, which in turn relies on the ROCs. The S-Match(property,keyword) metric relies on a property hierarchy. The final SEMRANK metric involves a (modulated) composition of five terms, four of which rely on a schema to be defined.
  1. Is this reliance a strength or a weakness?
  2. How do the metrics change as the presence, completeness, and quality of the schema changes?
Tim Lebo
Semrank Ankesh

Comment: The authors have chosen to omit discussions on rho-iso and rho-join semantic associations. I believe that for each kind of association required different approach to semrank evaluation. For eg.

  • in rho-iso association relation between rm and rn (last elements of the two paths) would be important. Association would be entirely different if rm and rn belong to disjoint classes, and not same class.
  • Similarly, for rho-join association the relation between properties on the path from r1 to rn and those on the path from r2 to rn may matter. Again, association may be different if the two paths contain disjoint properties, rather than similar (including subProperty) properties.
Can you give your thoughts on the following: If r1 and r2 are associated by more than 1 type of association (rho-path, rho-iso or rho-join), which associations would be ranked higher?
Ankesh Khandelwal


Attendees

Tim Lebo

Personal tools
Semantic Web Community
Tetherless World constellation
maintenance