User:Tim Lebo

From Semantic Portal Wiki

Jump to: navigation, search

Contents

Advanced Semantic Web (Fall 2008)

Overview of participation

Of the 42 paper presentations, Tim Lebo presented 3, attended 25, and did not attend 3.

While attending CSCI 6966 Advanced Semantic Web (Fall 2008), Tim Lebo gave four presentations:

  • a summary of his current "Accountable Visualization" research,
  • an overview of two conference papers, and
  • an overview of one journal article.

Tim Lebo made 47 posts containing comments and questions regarding presented papers. Many of these posts pose multiple questions.

Tim Lebo also contributed to the organization of the course wiki. The Course wiki contributions section lists these contributions.

Presentation attendance

Presentations Tim Lebo attended Speaker Date
Joshua Taylor Journal Paper Presentation 1 Joshua A. Taylor 24542831 July 2007
Joshua Taylor Slate Introduction Joshua A. Taylor 24547144 September 2008
Semantic Grounding Joshua Shinavier 20089011 Joshua Shinavier 245472111 September 2008
Community-basedMapping Shangguan 0911 Zhenning Shangguan 245472111 September 2008
Joshua Taylor 20080918 Presentation Joshua A. Taylor 245472818 September 2008
Tw:Brahms Medha Presentation 0918 Medha Atre 245472818 September 2008
Gregory Todd Williams SPARQL BGP Optimization Presentation Gregory Todd Williams 245473525 September 2008
Asma Ounnas Semantic Social Networks Presentation Asma Ounnas 245473525 September 2008
James Michaelis Provenance Model James Michaelis 24547422 October 2008
Alvaro Graves SemRank Alvaro Graves 24547422 October 2008
Joshua Shinavier Networked Graphs Joshua Shinavier 24547422 October 2008
Jesse Weaver RDF Management Approaches Jesse Weaver 24547422 October 2008
Gregory Todd Williams Graph Summaries Gregory Todd Williams 24547499 October 2008
Summary Abox Ankesh Ankesh Khandelwal 24547499 October 2008
Joshua Taylor presents Discovering Simple Mappings Between Relational Database Schemas and Ontologies Joshua A. Taylor 24547499 October 2008
Medha GRIN Presentation Medha Atre 245475616 October 2008
Shangguan MetaQuery Presentation Zhenning Shangguan 245475616 October 2008
Journal XixiLuo Xixi Luo 24547776 November 2008
Joshua Shinavier Laying the foundations for a World Wide Argument Web Joshua Shinavier 24547776 November 2008
Journal Ankesh Ankesh Khandelwal 24547776 November 2008
Jesse Weaver Presents Named Graphs Jesse Weaver 245478413 November 2008
GTW Time in RDF Gregory Todd Williams 245478413 November 2008
Shangguan Journal CNLPresentation Zhenning Shangguan 245479120 November 2008
Jiao Journal Presentation Jiao Tao 245479120 November 2008
Medha Journal Presentation Medha Atre 245479120 November 2008
(Made pseudo-disjoint classes Presentation attended by Tim Lebo and Presentation not attended by Tim Lebo)
Presentations Tim Lebo did not attend Speaker Date
Ankesh Sep11 Ankesh Khandelwal 245472111 September 2008
NSPARQL Jesse Weaver 20080911 Jesse Weaver 245472111 September 2008
Debbie Rank Typed Graph Walks Presentation Debbie Heisler 245476323 October 2008

Tim left early on 11 September for a business trip and was at VisWeek for the week of 23 October.

Papers presented

Tim Lebo gave the following paper presentations for this course:

Presentation Page Title of paper Paper has author
Abel2007enabling presented by Tim Lebo 25 sept 2008 Enabling Advanced and Context-Dependent Access Control in RDF Stores Fabian Abel
Juri Luca De Coi
Nicola Henze
Arne Wolf Koesling
Daniel Krause
Daniel Olmedilla
Theoharis2008graph presented by Tim Lebo 4 dec 2008 On Graph Features of Semantic Web Schemas Yannis Theoharis
Yannis Tzitzikas
Dimitris Kotzinos
Vassilis Christophides
Zhao2004using presented by Tim Lebo 9 oct 2008 Using Semantic Web Technologies for Representing E-science Provenance Jun Zhao
Chris Wroe
Carole A. Goble
Robert Stevens
Dennis Quan
Mark Greenwood


Questions posed

Tim Lebo posted the following in preparation for attending paper presentations for this course:

About Text
Anyanwu2005semrank The authors propose a few interesting metrics for ranking a set of semantic paths. A semantic path is an instance of a Semantic Association, which is a sequence of (RDF) properties. In the Background/Motivation section, the authors illustrate and define three example Property Sequences: rho-pathAssociation, rho-joinAssociation, and rho-isoAssociation.
  1. Are there compelling reasons for recognizing these types of semantic paths, i.e., are these structures applicable in real-world analysis?
  2. Are there other Property Sequences that are typical in the Semantic Association research area?
  3. Although these examples are helpful in understanding the purpose of analyzing semantic paths, it is unclear how these identified three relate to the metrics introduced in the remaining sections of the paper. Is there a deeper relationship between these three Property Sequences and the metrics proposed?
Anyanwu2005semrank The information content I(ps) is the addition of three terms: a ROC-aware min, a ROC-aware avg, and a non-ROC-aware max. The refraction count RC(PS) relies on the Semantic Summary, which in turn relies on the ROCs. The S-Match(property,keyword) metric relies on a property hierarchy. The final SEMRANK metric involves a (modulated) composition of five terms, four of which rely on a schema to be defined.
  1. Is this reliance a strength or a weakness?
  2. How do the metrics change as the presence, completeness, and quality of the schema changes?
Carroll2005named When comparing TriX to RDF/XML, the authors state, "The URI at which an RDF/XML document is published is used for three different purposes: as a retrieval address, with an operational semantics; as a means of identifying the document; and as a means of identifying the graph described by the document. There is potential for confusion between these three uses." The URIRef for a named graph performs only the latter function (identifying the graph).
  1. When encountering a URIref in a set of triples (e.g., ":G1 pr:disallowedUsage pr:Marketing ." in section 5.3), how does a crawler know that the URIRef identifies a named graph?
  2. When a crawler determines that a URIRef names a named graph (e.g., 'ng') , how does it obtain the rdfgraph(ng)? Would the process require anything different from the proposed methods for obtaining descriptions for a "plain old" rdfs:Resource?
Cattuto2008semantic
Cattuto2008semantic The introduction motivates the investigation of relatedness measures: "We believe that a deeper insight into the semantic properties of relatedness measures is an important prerequisite for the design of ontology learning procedures..." Although relatedness measures may be necessary for a "Ontology Learning" capability, they are arguably insufficient on their own.
  1. What other components are required to achieve Ontology Learning?
  2. Is this work already being done?
Cattuto2008semantic The paper compares five relatedness metrics. We could sit down and make up ten more.
  1. What does a "good" metric look like?
  2. How do we avoid a subjective evaluation of a metric's results? (e.g.,"An interesting observation is also that java and python could be considered as siblings in some suitable concept hierarchy")
  3. If we had the perfect metric (m*), what would we do with it?
Cattuto2008semantic
  1. Could you explain the difference between one-mode, two-mode, and three-mode analysis?
  2. What challenges are introduced when attempting three-mode analysis?
Cattuto2008semantic Notation nit: <math>R^T</math> and <math>R^n</math> seem to be used synonymously -- what is the distinction? Shouldn't <math>R^T</math> be <math>R^
Cattuto2008semantic I don't understand the statement "The reason for giving weight zero between a node and itself is that we want two tags to be considered related when they occur in a similar context, and not when they occur together.
  1. What is the difference between "similar context" and "occur together"? I would think that they are the same.
  2. Isn't "occurrence" the only "context" that the Folksonomy formalism provides?
Cattuto2008semantic The description of FolkRank mentions "random surfer vector" but does not introduce the term or it's purpose.
  1. Could you describe the random surfer vector?
Eiter2008combining The paper describes a "conservative extension" to the combination of DL's first-order semantics and logic programming's answer set semantics, where knowledge can be transferred between a DL knowledge base and a logic programming program. When describing how dl-programs can express the closed-world assumption "on top" of an external DL knowledge base, negation is asserted despite its DL provability.
  1. Could an inconsistency arise when the CWA assertions are shared with the DL knowledge base?
  2. I'm not familiar with logic programming and answer sets -- and reading the paper left me interested, impressed, and confused. Could you explain the gist of logic programming and how answer set semantics differs from description logic semantics? It sounds pretty cool.
Fokoue2006summary The summary of an Abox takes advantage of redundant assertions w.r.t consistency checking by collapsing individuals that are members of the same concept sets and are /not/ explicitly asserted to be different from each other. The authors note, "Any explicit assertions that two individuals are different from each other are maintained in the summary Abox."
  1. Would an exhaustive enumeration of "differentFrom" for each instance in the Abox render the summary method useless?
Gil2007towards The results of the simulation compare the Mean Squared Error, k-sum, and Edit Distance to a 'baseline', which is "a ... search engine ... that ranks search results by topic and popularity, ... without taking trust into account." A lot of assumptions are made to model the trust users have for associations and resources, and the simulation used 1,000 queries generated from 1,000 generated resources, 10,000 generated associations, and 1,000 generated users.
  1. Although the paper is trying to compare their trust-based ordering against 'traditional' search engines, are they not just comparing their technique to random performance instead?
Grau2007history The authors open the abstract by stating, "The development of ontologies involves continuous but relatively small modifications." A reasonable first step to reduce the ontology development cycle is to tackle the problem addressed in the paper: classify ontology O^2 by reusing the "evidences" from the classification of O^1, the set of added axioms, and the set of removed axioms. Their "module" technique can then be applied at each committed change. This addresses the relatively small modifications aspect of ontology development,
  1. but what about the continuous aspect?
  2. Could "modules" be determined using a history longer than only the previous step?
Gutierrez2007introducing
  1. How does the work presented in this paper differ from OWL Time?
  2. Could OWL Time be used to extend the temporal labels used by Gutierrez et al.?
Gutierrez2007introducing According to Footnote 3, the authors "chose not to ((use the standard reification vocabulary of RDF)) to stress the fact that the notions presented in this paper are independent of any view one may have about the concept of reification in RDF." Yet the "temporal label" is constructed in the exact same way the standard reification vocabulary would do it (:tsubj rdfs:subPropertyOf rdf:subject . :tpred rdfs:subPropertyOf rdf:predicate . :tobj rdfs:subPropertyOf rdf:object.).
  1. How can the authors reproduce the standard reification approach while pretending to stay at arms length from the standard?
In Definition 3, a mapping m is a 5-tuple <id,u,v,t,f>, where u is an element of the relational schema, v is an element of the ontology, t is a relationship (e.g., equivalence or subsumption), and f is a confidence measure. Phase II of the matching process views each element of the relational schema and ontology as a "virtual document" that is compared to the other elements' "virtual documents" using the the TF/IDF cosine measure. In a mapping m, id, u, v, and f are providing in this process.
  1. How is t (equivalence/subsumption) determined for a given mapping?
  2. Does the use the subsequent third Phase, Validity Mapping Consistency, indicate a lack of confidence in the previous stage's ability to match? What else "doesn't make sense" in the matching that hasn't been filtered out?
Janik2005brahms
  1. How many implementations of the depth-first search and bi-directional breadth-first exhaustive search algorithms were used in the evaluation?
  2. Three languages were used (C++ (BRAHMS), C (Redland), and Java (Jena and Sesame)) among the four triple stores. If multiple implementations were used, what assurances were made that they performed similarly before incorporating the triple store?
Janik2005brahms
  1. Did the authors repeat the load and execution tests to demonstrate reproducability?
  2. How much variability in the results could we expect?
Lin2008discovering The authors make a very good point regarding the "ill defined" nature of using probabilistic measures for a deterministic graph structure. The two Random Experiments that they propose are unique, intuitive, and probabilistically sound methods for obtaining probabilistic measures. But from the time that they propose the method, they do not discuss the methods' computational expense until the last paragraph of the paper, "an important future direction is to improve the scalability of the system. What is most expensive is the computation of feature values, since it requires the system to count a potentially large number of paths."
  1. How long should the Random Experiments be reasonably run to obtain characterize the input while trading off the time required to do so?
Moreau2007open It is clear that the intent of this paper is to introduce the start of a common model for provenance and NOT to motivate the use of provenance systems. The need for a common model for any mutual interest is self-evident. However, some motivation for the use of provenance would be helpful.
  1. What applications require any provenance representation,
  2. what are the benefits for its use, and
  3. how many provenance models are in use today (e.g., how many systems need to reconcile with the common model)?
Noy2008collecting The ability to provide a "One-stop shopping for ontology resources" remains overdue.
  1. Why do you think it has taken so long to establish this type of capability and why is it not the norm?
  2. It seems that after a lot of hard work by a team, ontologies are dead-on-arrival when they hit the web. What is preventing the Web philosophy from nurturing continued use and development of existing ontologies?
Noy2008collecting
  1. Is reification used to represent the Mapping <C_s,C_t,R,M>?
  2. If so, have your experiences with reification been positive or negative? If not, how did reification not scratch the itch for your application?
Noy2008collecting The authors state "One mapping can depend on another: 'If X is Y, then A is B'."
  1. How is this dependency mapping modeled?
Noy2008collecting (diagram nit) Fig. 2 is great for illustrating the connectivity between ontologies. Making the nodes' size proportional to the size of the ontologies they represent might be more communicative, since the cardinality between two ontologies could usefully be considered with respect to the sizes of each ontology.
Noy2008collecting Claims of "complete domain-independence" are often overstatements.
  1. To what extent is this technology domain-independent, and in what areas is it not?
Perez2008nsparql The "path-like" nature resembles Fresnel Selector Language (FSL) http://www.w3.org/2005/04/fresnel-info/fsl/.
  1. Where does FSL fit within the family of nSPARQL and the others discussed in the Related Work section?
Perez2008nsparql (just a comment) The "axis-like" nature, along with the "forward-backward" directionality, resembles Ted Nelson's zzStructure http://www.nongnu.org/gzz/gi/gi.html. The zzStructure incorporates a layout that facilitates visual navigation. It would be interesting to compare this work from a different discipline in a different era (he was decades ago).
Perez2008nsparql The authors note "the occurrence of variables in the predicate position of triple patterns is forbidden in nSPARQL." This limitation resembles another that was described in the DARQ paper: "The matching compares the predicate in a triple pattern with the predicate defined for a capability and evaluated the constraint for subject and object. Because matching is based on predicates, DARQ currently only supports queries with bound predicates."
  1. Is there something fundamental about the need to assume a bound predicate? If so, does this inhibit the ability to make advancements throughout the field?
Perez2008nsparql
  1. Do you think the nested-regular-expression construct proposed in nSPARQL will make it into future SPARQL recommendations?
Quilitz2008querying The authors state, "There is no other need for cooperation except of the support of the SPARQL protocol." Yet later, "To find the relevant information sources for the different triples in a query and to decompose the query into sub-queries the query engine needs information about the data sources." DARQ relies on the service descriptions to determine how to decompose the query.
  1. How do the service descriptions not require cooperation to obtain the capabilities <math>C_D</math> of a data source D? "We deliberately use only these simple statistics because we expect every data source to be able to provide them, or at least rough estimations."
Quilitz2008querying
  1. Aren't Limitations subsumed by Constraints? That is, can't all Limitations be expressed as Constraints?
Quilitz2008querying
  1. How does the subject (object) being bound effect the subject (object) selectivity <math>ssel_D(p)</math> (<math>osel_D(p)</math)?
  2. How does this value relate to it's alternative when the subject (object) is not bound?
Quilitz2008querying
  1. What are the Service Descriptions for the dbpedia endpoints? This is a critical component to understanding /why/ the optimization performed better. "Our evaluations show that even with a very limited amount of statistical information it is possible to generate query plans that perform relatively well." What statistical information was used?
Quilitz2008querying It is not clear how the dbpedia data were split across the two servers.
  1. How were they split? For example, Q1 used sources S4 and S5. If both S4 and S5 were on the first server, the second was idle.
Quilitz2008querying
  1. Why are their /five/ logical endpoints on two servers?
  2. How does the setup of two Sun-Fire-880 machines "make sure that the endpoints are not a bottleneck"?
Quilitz2008querying
  1. What is the computational complexity of the planning and optimization? Execution timings can be a misleading indicator of the efficiency.
Rahwan2007laying The authors chose to model their argument ontology in RDFS.
  1. Although this makes sense because of RDF's suitability for distributed augmentation of existing resources, how are the semantics of RDFS beneficial in this application? The authors restrict themselves to RDFS while wishing for disjointness.
  2. Why, in 2007, would an RDFS ontology developer choose not to use OWL? "Translating the ontology to more expressive Semantic Web ontology languages such as OWL can also enable ontological reasoning over argument structures, for example, to automatically classify arguments, or to identify semantic similarities among arguments."
  3. Could you suggest any examples for how (and why) arguments could be classified or similarities could be determined? The interfaces shown indicate a large jump from today's blogging to tomorrow WWAF.
  4. What kind of users would use the system the paper proposes? How much "argument theory" would they be expected to know? If a novice to "argument theory" used the system, how likely would their products align with the reasoning the system can provide? The system proposed involves adding additional explicit structure to the "blogging" of today.
  5. How far would a user need to decompose the free text, and how often would that granularity need to change as the argument evolves? At what point does the structure inhibit the thought process of the user?
Schenk2008networked The extension of named graphs that the authors describe looks like it will be very useful in Semantic Web applications. A couple of pragmatic questions.
  1. Do the QNames in the rdf:Literal values of g:definedBy inherit the prefix bindings defined in the Networked Graph file?
  2. Tucking query expressions into literals seems to be a recurring theme (e.g., Fresnel does it for SPARQL and FSL expressions). However, my experiences with managing query expressions within literals has been clunky at best. Has anyone proposed an analogous rdf:Property that indicates a URI that should be dereferenced to obtain the query expression? If so, where. If not, is it needed? It seems that this would fit well within a development environment.
Schenk2008networked In the Related Work section, the authors compare Networked Graphs to ActiveXML: "The semantics of ActiveXML documents is also defined using a fixpoint, but unlike NGs, ActiveXML documents are infinite in general."
  1. What does "infinite in general" mean and why is it not desirable?
Schenk2008networked In the Related Work section, the authors compare Networked Graphs to NRL: "NEPOMUK Representation Language NRL ... allows for various ways to specify 'views', which are defined with a procedural semantics in contrast to the declarative semantics of NGs."
  1. What is the distinction between procedural semantics and declarative semantics?
Schmidt2008experimental The only place the ratios of usr, sys, and total response times are mentioned is in the discussion for Q1 ("Return the year of publication of 'Journal 1 (1940)'", where the authors state, "The gap between total and usr+sys for 25M indicates that much time is spent in waiting for data being read from or written to disk". The choice to use different vertical scales in Figures 1-3 leads to an investigation of these ratios while obscuring a natural consideration of the more important issue: the relative response times between triple store approaches. Regardless, the ratio usr/sys falls within one of three categories: minority/majority, all/none, and none/none -- and the ratio's category transitions from none/none, to all/none, to minority/majority as the data size increases within a condition.
  1. What does the ratio of usr and sys times indicate with respect to the performance of the query execution?
Schueler2008querying When describing their design choices, the authors state, "For compliance with existing applications that access the repository in a common way (e.g. using SPARQL queries), we do not modify existing user data." The preservation of user data is clearly a important. But they disregard RDF reification as an option, saying "This requirement does not allow us to use mechanisms like RDF reification, which decompose existing triples and fully change the representation model."
  1. In what way would RDF reification (the use of rdf:Statement) decompose and fully change the representation model?
Schwitter2008controlled The Controlled Natural Language presented in this paper seems to be useful for pedagogical purposes, but little more. Although hiding a more obscure syntax /may/ increase the approachability and/or "explainability", it seems that a novice would still require a background understanding of OWL Lite^minus to do anything useful. The novice is at risk for establishing a greater expectation for the language based on the assumption that the English-like sentences are English; these expectations will be unfulfilled by the restricted ("focused"?) capabilities of a logic-based system. On the other hand, the verbosity that provides approachability for the novice would hinder use by "knowledge engineers" who prefer a more concise syntax. However, these issues are most concerning only when authoring the TBox and ABox. When posing questions to the system, the interaction seems more natural. This is done by generating a satisfying model with SATCHMO, which includes all entailed statements.
  1. What is a "satisfying model",
  2. how does it differ from other kinds of models, and
  3. how is it generated?
Szomszor2008semantic The authors claim that "Tagging ... (is a) knowledge management mechanism that users find easy to use and understand."
  1. What scientific, non anecdotal, evidence do we have that can convince us of this?
  2. Do /all/ users find tagging to be easy to use and understand?
  3. Are there certain contexts in which tagging is easy to use and understand, but not others?
Szomszor2008semantic The authors claim that their "results show that far richer interest profiles can be generated."
  1. What meaningful metric are they using to measure 'richer', such that we can know that their tag-combining approach provides 'far richer' results? One suggestion would be to offer the service described in the paper and gather metrics for how many del.icio.us and flickr users request the service, suggest the service, refer back to the results, etc.
Udrea2007grin It would seem that the method used to determine cluster centers would drastically influence query performance. The selected clustering algorithm (e.g., PAM) and the inter-cluster metric (e.g., single, complete, or average link) would both be factors of performance. The authors do not commit to an inter-cluster distance metric d_c and devote one sentence discussing the results of the comparison: They all "performed" the same within 5%.
  1. Why do you suppose GRIN's index creation times, index sizes, and query times were invariant to the inter-cluster metric used?


Course wiki contributions

Other

He also likes ice cream, and cookies, and ate chocolate at least one time in his life.

reification:

[
 p:s Tim Lebo;
 p:p ate;
 p:o chocolate;

 date "2008/12/3";
] .

Tim Lebo (Person) [ Edit ]
File:Anonymous.png
Basic Description
Contact Information
General Relation
Inferred Relation
Personal tools
Semantic Web Community
Tetherless World constellation
maintenance