Joshua A. Taylor

From Semantic Portal Wiki

Jump to: navigation, search

Joshua A. Taylor (Person) [ Edit ]
File:Anonymous.png
Basic Description
affiliation RPI
occupation Category:PhD Student
homepage http://www.cs.rpi.edu/~tayloj/
Contact Information
General Relation
Inferred Relation

Contents

publications

total:0


Projects

My Statistics

Thanks are due to Ankesh Khandelwal for providing the basic format for these statistics. I've modified them a bit, and made them more generic (since they're on my page, I've just used {{PAGENAME}} instead of a name — this should make it easier if anyone borrows from my page. I also added a section for answers.

  • Total number of presentations in the Advanced Semantic Web class to raise questions for = 39
  • Number of presentations for which I raised some questions = 32

(Hm… Evidently I've asked questions on more presentations then there have actually been. That can't be right, but I'm going to leave it as it is, since the code from which the queries are taken was presented as a good example of what to do, and I'm not sure what the "- 3" is for.)

My Presentations

Presentation Page Paper Presented Authors URL
Joshua Taylor 20080918 Presentation History Matters: Incremental Ontology Reasoning Using Modules Bernardo Cuenca-Grau
Yevgeny Kazakov
Christian Halaschek-Wiener
http://iswc2007.semanticweb.org/papers/183.pdf
Joshua Taylor Journal Paper Presentation 1 Dynamic, automatic, first-order ontology repair by diagnosis of failed plan execution Fiona McNeill
Alan Bundy
Joshua Taylor presents Discovering Simple Mappings Between Relational Database Schemas and Ontologies Discovering Simple Mappings Between Relational Database Schemas and Ontologies Wei Hu
Yuzhong Qu
http://iswc2007.semanticweb.org/papers/225.pdf


My Answers

Questions on my presentations to which I have provided answers.

Question Answer
Grau2007history question 1 by lebo The authors open the abstract by stating, "The development of ontologies involves continuous but relatively small modifications." A reasonable first step to reduce the ontology development cycle is to tackle the problem addressed in the paper: classify ontology O^2 by reusing the "evidences" from the classification of O^1, the set of added axioms, and the set of removed axioms. Their "module" technique can then be applied at each committed change. This addresses the relatively small modifications aspect of ontology development,
  1. but what about the continuous aspect?
  2. Could "modules" be determined using a history longer than only the previous step?
HistoryMatters GregoryToddWilliams Question1 The Gene Ontology seems to be the only ontology for which update sizes are varied during evaluation (with n=1,2,4) whereas the other ontologies are all tested for a fixed update size. Are the gains in reclassification time seen in the results expected to be the same across all ontologies, regardless of update size? Furthermore, is the seemingly narrow range of tested update sizes (1-4) expected to approximate real-world usage? Would the same gains be seen with updates of different characteristics such as unbalanced axiom removal/addition counts or larger updates (tens or hundreds of affected axioms)?
Hu2007discovering question 1 by lebo In Definition 3, a mapping m is a 5-tuple <id,u,v,t,f>, where u is an element of the relational schema, v is an element of the ontology, t is a relationship (e.g., equivalence or subsumption), and f is a confidence measure. Phase II of the matching process views each element of the relational schema and ontology as a "virtual document" that is compared to the other elements' "virtual documents" using the the TF/IDF cosine measure. In a mapping m, id, u, v, and f are providing in this process.
  1. How is t (equivalence/subsumption) determined for a given mapping?
  2. Does the use the subsequent third Phase, Validity Mapping Consistency, indicate a lack of confidence in the previous stage's ability to match? What else "doesn't make sense" in the matching that hasn't been filtered out?
  1. The authors specify that a mapping m contains a relationship t that holds between the u and v. I think this definition is confusing as the authors never make use of this t. Simple mappings seem to find equivalence relationships, and contextual mappings seem to find subsumption relationships. Personally, I think that it would be useful if the system allowed human users re-run certain phases, and to make adjustments between runs. Phase 3 checks the consistency of attribute/property mappings, and this could be facilitated if users could step in and clarify the relation/class mappings (to specify equivalence, subsumption, &c.).
  2. As I mentioned in the preceding item, the consistency checking phase checks the consistency of mappings between, assuming (or provided) that the relation/class mappings have been computed correctly. I think the unspoken assumption is that relation/class mappings are more likely to be correct (and this is probably reasonable, particularly if the Des function has access to good descriptions, and if the number of entity relations in the database and classes in the ontology is small). So, I don't think that the presence of Phase 3 indicates a lack of confidence so much as a recognition that attribute/property matching is more difficult than relation/class matching, but is easier in the presence of the latter.
Joshua Taylor 20080918 Presentation Jesse Weaver

What is the time complexity of the proposed algorithm with respect to the number of subsumptions that change their entailment status?

One of the main assumptions proposed in section 3 is that "the number of subsumptions that change their entailment status w.r.t. the ontology ... is probably small compared to the number of subsumptions that do not ...." What about when mapping ontologies? For example, consider the paper "Collecting Community-Based Mappings in an Ontology Repository". In that paper, users are allowed to map certain relationships between ontologies. These mappings do not use OWL properties but rather properties indicating, for example, that two classes _could_ be considered equivalent (among others). Now imagine a user selects an ontology to add to his/her own preexisting ontology, replacing those "could-be-considered-equivalent-class" relations (between the two ontologies) with actual owl:equivalentClass relations for purposes of reasoning over data that uses both ontologies. This seems like a plausible situation in which the aforementioned assumption may not hold. What kind of results would be gotten from experiments reflecting this ontology-mapping example?

A casual question of interest: How could such an incremental approach be applied (if at all) to promote reasoning in large and/or distributed RDF stores?
Joshua Taylor 20080918 Presentation Joshua Taylor 2 Quoth the first paragraph of Section 6 "Empirical Evaluation", "Our system implements a slightly more simplistic procedure than the one in Algorithm 2; in particular, once the affected modules have been identifies, our implementation simply reclassifies the union of these modules using Pellet to determine the new subsumption relations, instead of using the procedure described in lines 20–34 of Algorithm 2." Does this have any significant effect on the results?
Joshua Taylor Discovering Simple Mappings Gregory Todd Williams 1 The definition of a mapping (Definition 3) is stated in such a way as to allow mappings from R to P and from A to C. Is this just a sloppy definition? In figure 1 just below definition 3, a mapping from "Paper" in R to "Journal Paper" in O is shown with "Paper" subsuming "Journal Paper". Of what use is this mapping given the stated direction of subsumption? If the relation subsumes the class, aren't we unable to say anything about the class that a tuple in the relation belongs to?
Joshua Taylor Discovering Simple Mappings Gregory Todd Williams 2 In section 4.1, it is stated that entities are partitioned "in the relational schema and the ontology" into four groups. However, it seems that object properties (denoted PO) belong to groups 2-3. Is this correct? If so, can these groups accurately be called partitions? Does this diminish the claim that these partitions "limit the searching space of candidate mappings"?
Joshua Taylor presents Discovering Simple Mappings Between Relational Database Schemas and Ontologies Jesse Weaver Line 3 in the algorithm presented in Table 1 iterates over all of disjoint subclasses of class v. What if class v has subclasses that are not disjoint? How would it handle this condition? Also, line 9 of the same algorithm uses a threshold τ. What value of τ is used in the second experiment (generating contextual mappings)?
Joshua Taylor presents Discovering Simple Mappings Between Relational Database Schemas and Ontologies Jesse Weaver 2 Figure 5 shows Marson achieving an average F1-Measure of less than 0.8 in four out of the five cases shown. How good is an average F1-Measure of 0.8? How good is an average F1-Measure in the range of 0.7 to 0.9?
Joshua Taylor presents Discovering Simple Mappings Between Relational Database Schemas and Ontologies Joshua Taylor 1 What's up with this evaluation? Is the system really only being compared against one other system? The Marson system is compared against the 2006 Ronto system which was developed by a team disjoint with the authors. The comparison with the Simple, VDoc, Valid systems seems a bit odd. The authors have developed a four-phase process for generating mappings, and have a reason for designing each phase. They then compared their system against crippled versions of itself. It is not particularly surprising, in my opinion, that it tends to come out on top (although, in the case of OBSERVER/Bibiography, the Valid system even does a little better). Rather than having shown that Marson is a good system (compared to other systems in the world), it seems that the combination of techniques present in Marson is better (usually) than a subset of those techniques.
Mappings RDB Ontology Ankesh
  1. In section 4.1 the paper says n-arity relationship should be reified as a group of binary relationships. I couldn't create a clear picture how this would be done, especially what would the namings of the relations be. Would it be the attribute name that is not part of primary key? What does reified mean here? How would tokens be affected?
  2. In section 4.3, paper validates relationship between attribute id in author and hasID in ontology. I couldn't understand why would they be mapped in first place? Because from (2) VD(id) would contain Des(author) where as from (4) VD(hasID) would contain Des(Paper). What is confident mapping (good confidence measure)? Could you help with a better example for validation?
  3. In section 4.4, the algorithm distinguishes categorical attribute to non-categorical. However, it isn't clear to me how do we determine if an attribute is categorical? A naive description can be any attribute not part of primary key is categorical. Can this be correct?
  1. An example from Wikipedia, particularly http://en.wikipedia.org/wiki/Relation_(mathematics) :
    An example of a ternary or triadic relation (i.e., between three individuals) is: "X was-introduced-to Y by Z", where (X,Y,Z) is a 3-tuple of persons; for example, "Beatrice Wood was-introduced-to Henri-Pierre Roché by Marcel Duchamp" is true, while "Karl Marx was-introduced-to Friedrich Engels by Queen Victoria" is false.
    This (somewhat contrived) relation could be represented using a relational database. It would be a "relationship relation", and each row would store an X, a Y, and a Z. In order to represent this relationship using a triple based model, it is necessary to introduce some sort of "introducing event" objects which correspond to the rows of the table, and which would be the domain of three properties, say, introduction_event:person1, introduction_event:person2, and introduction_event:introducer. A similar thing has to happen if the database contains a Person table with three attributes, name, birthdate, and gender; the difference is that it does not seem strange to us to introduce Person objects—they're just people.
  2. The authors are using this example to show how the validation process would eliminate an inconsistent mapping. (Their language is a bit unclear, though.) Presumably the mapping between hasID and id would be discovered based on the results of Des(hasID) and Des(id). Though in this example "hasID" and "id" are distinct tokens, they might, in reality, have some more complex descriptions which have some similarity. If this isn't the case, it is hard to imagine the authors' system overcoming any lexical differences in terminologies that arise between databases and ontologies.
  3. Immediately below 'ContextMatch, the authors write "In lines 6–8, the algorithm repeatedly examines each attribute in the relation ((to determine)) whether it is a categorical attribute or not." Based on the way that they use the categorical attributes, a categorical attribute is one in which a partitioning of instances based on their attribute values corresponds to some partitioning of the disjoint subclasses of the class at hand.


My Questions

Question Page Presentation Question
Abel2007enabling presented by Tim Lebo 25 sept 2008 Joshua Taylor 1 Abel2007enabling presented by Tim Lebo 25 sept 2008 Is there any provision made for whether restricted triples may be indirectly accessed? For instance, A query could be posed asking for individuals ?x and ?y where ?x hasPhone ?z and ?y hasPhone ?z. If both Joe hasPhone 555-5555 and Mary hasPhone 555-5555 are restricted, then answering the aforementioned query would be permissible since even answering with ?x ↦ Mary, ?y ↦ Joe wouldn't even indirectly allow either restricted triple to be reconstructed. While this example might seem contrived, it is easily conceivable that with a sufficient number of joins, restricted triples could be used without the particular triples being available from the end result.
Alvaro Graves SemRank Joshua Taylor 1 Alvaro Graves SemRank in 5 EMPIRICAL EVALUATION we read, "There are an increasing number of publicly available RDF data sets from those that are narrowly focused (e.g. DBLP, ODP) to those with broader scope covering multiple domains (e.g. TAP, SWETO). However, most of these presented limitations that made them unsuitable as evaluation testbeds for SemRank. … Consequently, the evaluation of SemRank discussed here was done on synthetically generated data. The data generation was guided by rules to ensure that data distributions mirror the real world." The authors state that they believe that "testbeds for the Semantic Web" are in "early developmental stages". Nonetheless, it seems like they are claiming that the real world data available to them does not mirror the real world, and so come up with their own model of the real world. How accurate is their perception that the current datasets do not accurately reflect the real world, and how realistic is their expectation that the structure will evolve significantly? More importantly, how does this affect their evaluation. Have they simply developed data sets on which SemRank is promising, or have they shown that SemRank is ready for the World of Tomorrow?
AnkeshSep11JoshuaTaylor1 Ankesh Sep11 One of the motivations for distributed queries was that it is impractical to download a remote site's dataset. However, it seems that downloading numerous service descriptions would seem to cause some of the same problems, particularly since some of the service descriptions would change very often, e.g., statistical descriptions which carry the number of triples in the dataset. (Would this even be realistic for systems that don't store triples, but provide a SPARQL endpoint on top of other systems?).
AnkeshSep11JoshuaTaylor2 Ankesh Sep11 The evaluation section makes no use of systems that provide dynamic information. How would creation of service descriptions for such systems affect performance? E.g., statistical service descriptions include the number of triples in the datatset---how does computing this affect performance?
AnkeshSep11JoshuaTaylor3 Ankesh Sep11 Are some of the partitions of datasets realistic? E.g., one of the example service descriptions mentions the ability to search for names beginning with letters from A-R. Do real systems actually do things like this, and if they are, they must have some way of dealing with needing to query different data sources for similar information---how do those handle these issues?
Asma Ounnas Semantic Social Networks Presentation Joshua Taylor 1 Asma Ounnas Semantic Social Networks Presentation In 3.3 Step 1: Syntactic Filtering the authors write "tags tjat are too small … or too large … are discarded. … Tags containing numbers are also filtered according to a set of custom heuristics: … we consider global tag frequency and discard any unpopular tags. Finally, common stop-words … are discarded." In class we've discussed some of the problems that can arise in discarding tags. Here is appears that this applies only to tags with numbers. The authors also apply a number of filters (to address misspellings, neologisms, and domain specific terminologies. Are there any effects of this that (other than what they intend) of which we should be aware?
Brahms Medha Presentation 0918 Joshua Taylor Question 1 Tw:Brahms Medha Presentation 0918 The authors seem to have built an RDF store for the sole purpose of discovering semantic associations (paths connecting resources), and their store does seem to be effective. However, their motivation stems from the difficulties that other, more general RDF stores had in achieving this task. Yet no pre-processing seems to have been performed on the RDF graphs. In their Future Work section, the authors state their intention "to experiment with a variety of semantic association discovery algorithsm, utilizing a language for defining regular paths … . The regular expressions defined over the RDF resources and types … will enable us to define the association paths of interesting patterns and significantly restrict the search space of the semantic association discovery." If only certain bits of the graph are interesting, and BRAHMs isn't a general-purpose RDF store, why not just throw out uninteresting triples? And if this approach can be taken, why not ease the burden of the other systems by only asking them to store the interesting triples. Perhaps graphs might then be small enough for them to store.
Brahms Medha Presentation 0918 Joshua Taylor Question 2 Tw:Brahms Medha Presentation 0918 The authors do not seem to cite the particular depth-first search that they are using, but it does not seem to compare well with their bi-directional breadth first search. Is there any indication of whether they are using a straightforward depth first search versus an iterative deepening search (as discussed by, e.g., Russell and Norvig in "Artificial Intelligence: A Modern Approach")? On a related note, the authors mention in their "Future Work" section that they plan on experimenting with languages for expressing regular paths. This suggests that they recognize some predicates and resources as more interesting than others. They have developed an efficient RDF store for performing graph based search algorithms, and seem to have a notion of cost, or benefit, on the edges and nodes of the graph—why do they not mention using heuristic algorithms?
Community-basedMapping Shangguan 0911 Joshua Taylor Question 1 Community-basedMapping Shangguan 0911 The authors mention that "when we create a mapping between the anatomy part of the NCI Thesaurus and the FMA, our goal is not to merge the two ontologies, but rather to help applications integrate the data that was annotated with terms from either ontology. We expect, however, that many applications will use only one or the other ontology. In the field of biomedical ontologies, researchers often think of ontology mapping not as a bridge between two ontologies, but rather as a glue that brings the two ontologies together to create a single whole, with clearly identifiable components. In this case, the ontologies that are mapped are intended to be used together, as a single unit." In the former case that reasoning is performed using under just one ontology, and the mappings serve as "bridges," what system is responsible for applying the mapping to information from a source ontology to yield information in the target ontology? Doesn't this system necessarily require knowledge of both ontologies?
Debbie Journal Presentation Joshua Taylor 1 Debbie Journal Presentation The authors are at the point of analysis where they can begin to look at how certain groups of people tag various articles and the like. Rather than evaluating a single produced ontology, could this technology be used to extract multiple ontologies? After all, if a Semantic Web researchers tag articles methodically according to their own terminology, and a group of arachnologists does likewise, it would be good to notice that both groups use the tag "web", but that they use them in different ways. Could the Semantic Web and Arachnology ontologies be extracted as independent entities?
Debbie Journal Presentation Joshua Taylor 2 Debbie Journal Presentation Can the more sophisticated tripartite model capture everything of the earlier bipartite model? (This is a typical test of whether one framework is more general than another — that is, it is more general if the less general model can be captured as a special case.) Also, in the bipartite model, associations between instances and concepts were tracked. In the tripartite model, the same associations are tracked, but with the additional information of who made the association. How is this different than a provenance model which tracks the same thing? Or, if addressing ontology drift is one of the author's concerns, why not use a quadripartite model that also tracks time? Then ontologies could be extracted with temporal information and ontologies from different times compared.
Gregory Todd Williams SPARQL BGP Optimization Presentation Joshua Taylor 1 Gregory Todd Williams SPARQL BGP Optimization Presentation It seems that 3.2 Optimization Algorithm is simply building a minimum spanning tree, where edges are selected based are weighted based on estimated selectivity (which aren't discussed until later). Is this correct, or is there more to it?
Gregory Todd Williams SPARQL BGP Optimization Presentation Joshua Taylor 2 Gregory Todd Williams SPARQL BGP Optimization Presentation What purpose does the graph representation (as presented in 3.1 Preliminaries and 3.2 BGP Abstraction serve? Its used, to some extent in 3.2 Optimization Algorithm, but it seems that two pages exposit a representation that isn't really exploited afterward.
Gregory Todd Williams SPARQL BGP Optimization Presentation Joshua Taylor 3 Gregory Todd Williams SPARQL BGP Optimization Presentation The authors state in 1. INTRODUCTION that "We focus on the evaluation of the presented optimization techniques without comparing the figures with the performance of alternative implementations." What value then ought we to ascribe to the work the authors performed? If I were to start with an exponential-time algorithm, and invent an optimization that yields an polynomial-time algorithm, and it will seem impressive, but if someone else has recognized that the problem can be solved in linear or constant time, it would be somewhat suspicious if I was not "comparing the figures with the performance of 'alternative implementations'" (single/scare quotes added for emphasis).
James Michaelis Provenance Model Joshua Taylor 1 James Michaelis Provenance Model I recognize that this is a more formal technical report, and not a conference paper. As such, it is fitting that the authors jump almost immediately into their formalism. I am not particularly familiar with the general or particular needs of provenance tracking systems. What are some such needs? Does the authors' system address these? While they have certainly addressed some of their desiderata from 1 Introduction, it is not clear that they "allow provenance information to be exchanged between systems, by means of a compatibility later based on a shared provenance model." The authors provide a model that anyone can adopt, but is it a model that matches existing systems?
James Michaelis Provenance Model Joshua Taylor 2 James Michaelis Provenance Model Inference rule (1) from 6.1 One Step Inferences would seem to have some unintuitive, at least to me, consequences. For instance, it is easy to imagine extending Figure 2 (which depicts John controlling the process Bake which uses a number of ingredients and produces Cake) by adding a Make Fork process controlled Factory Worker, using Steel and Fork Mold, and produces Fork. Then, Susan might control an Eat Cake that uses Cake and Fork, and produces No Cake (or perhaps, Empty Plate). It seems that Inference Rule (1) would allow us to conclude that Eat Cake was triggered by Bake, and that Eat Cake 'was triggered by Make Fork. Recall Definition 7 (Process Triggered by Process) A connection of a process P2 to a process P1 by a "was triggered by" edge indicates that the start of process P1 was required for P2 to be able to complete. Are there any provisions for multiple possibilities? E.g., Eat Cake does not really depend on a particular instance of Make Fork, just about any Make Fork would do. Could this formalism handle something like recognizing that a codex was written over at some time between 1237 and 1342 without knowing by whom and exactly when? Could it be extended to handle such cases?
Jesse Weaver RDF Management Approaches Joshua Taylor 1 Jesse Weaver RDF Management Approaches In the penultimate paragraph of 2 The SP2Bench Scenario we read, "The table also lists the number #prop. of distinct properties. This value x+y splits into x "standard" attribute properties and y bag membership properties rdf:_1, …, rdf:_y, where y depends on the maximum-sized reference list in the data." The authors also make the point that difficulties are introduced as this number increases. In 3.3 The Purely Relational Scheme it becomes clear that they are capable of representing the references without using containers. It seems that using a container for a reference list rather than relating the paper to the referenced work with dcterms:references (as is done for authors with dc:creator) introduces unnecessary difficulties. Have you any thoughts about why they did this, and whether it alters performance and evaluation?
Jesse Weaver RDF Management Approaches Joshua Taylor 2 Jesse Weaver RDF Management Approaches In 4 Experimental Results the authors write, "As our primary interest is the basic performance of the approaches (rather than caching or learning strategies), we performed cold runs, i.e. destroyed the database in-beween each two consecutive runs, and always restarted it before evaluating a query." Is this realistic? Clearly it means that caching and learning strategies will not influence the result, but if those are typical of database systems, would it not make for a better evaluation if they had cold runs in addition to "warm" runs where caching and learning strategies could affect performance? If the differences were not significant, then the authors would have shown that, and if the differences were, and those uses are more typical, then the comparison would be more useful.
Jesse Weaver RDF Management Approaches Joshua Taylor 3 Jesse Weaver RDF Management Approaches In the Conclusion, the authors point out that storing the data in a relational database whose schema is based on the ontology at hand performed better than any of the other RDF stores. This is not particularly surprising, as it the most specialized, but least flexible, representation. I wonder if a hybrid approach in which database tables are constructed based on formal ontology descriptions (e.g., in RDFS or OWL), and triples using this vocabulary are stored in said specialized table, but other triples are stored using a more general approach (but within the same database) would be practical/useful, and how such a system would fare in this evaluation. What do you think?
Joshua Shinavier Networked Graphs Joshua Taylor 1 Joshua Shinavier Networked Graphs In 6. IMPLEMENTATION, the authors describe their algorithm for determining the fixedpoint of a graph/view. They write, "In some more details, the procedure starts with the true statements, which are extensionally lister, or which can be derived from views, which do not use negation. //Though, I thought that views could use negation…// We call this underestimate U1. Statements in U1 are known to be true. U1 is used to compute an overestimate O1 by evaluating all views against this set of true statements. The result will be an overestimate, because U1 was still incomplete and therefore bound negation will succeed in too many cases. //So views can contain negation…//" (emphasis added) I do not see how they can guarantee that O1 will be an overestimate. It is generated based on U1 whose elements are known to be true, but which is not yet the set of all true statements. Then if O1 does not depend on views using negation, it would seem that O1 could be another underestimate. In 5.1 Requirements …, RDF Schema the authors state "that NGs are expressive enough to alternatively model the RDFS inference rules as view definitions." I imagine a a U1 containing "X rdf:type A", "A rdf:subClassOf B", and "B rdf:subClassOf C", which would lead to an O1 containing U1 as a subset, and also "X rdf:type B", and perhaps "X rdf:type C". Yet O1 is clearly not an overestimate. What am I missing here?
Joshua Taylor 20080918 Presentation Joshua Taylor 1 Joshua Taylor 20080918 Presentation As described in Section 6 "Empirical Evaluation", in step 3 the authors "extracted the minimal locality-based module for each atomic concept". However, the only method for extracting modules seems to be Algorithm 1, about which is stated "Finally, we point out that the modules extracted using Algorithm 1 are not necessarily minimal ones. That is, if O ⊨ α, the computed module might be a strict superset of a justification for α in O, and if O ⊭ α then the module for Sig(α) might not necessarily be the empty set." Then how is a "minimal locality-based module for each atomic concept" produced?
Joshua Taylor 20080918 Presentation Joshua Taylor 2 Joshua Taylor 20080918 Presentation Quoth the first paragraph of Section 6 "Empirical Evaluation", "Our system implements a slightly more simplistic procedure than the one in Algorithm 2; in particular, once the affected modules have been identifies, our implementation simply reclassifies the union of these modules using Pellet to determine the new subsumption relations, instead of using the procedure described in lines 20–34 of Algorithm 2." Does this have any significant effect on the results?
Joshua Taylor 20080918 Presentation Joshua Taylor 3 Joshua Taylor 20080918 Presentation In the evaluation, random axioms are selected to be added and removed for incremental changes. It seems more likely that an ontology developer might re-evaluate the semantics of a particular concept, and that these changes might be to a slightly larger number of axioms, but affecting a smaller number of modules (than if randomly selected axioms were changed). How does this affect evaluation?
Joshua Taylor presents Discovering Simple Mappings Between Relational Database Schemas and Ontologies Joshua Taylor 1 Joshua Taylor presents Discovering Simple Mappings Between Relational Database Schemas and Ontologies What's up with this evaluation? Is the system really only being compared against one other system?
NSPARQL Jesse Weaver 20080911 Joshua Taylor 1 NSPARQL Jesse Weaver 20080911 Admittedly, I had a hard time following all of the mechanics of the implementation. At the end of Section 3, however, the EVAL algorithm finishes with if ... then return YES else return NO. If I made a SPARQL query of the form "Paris ?p Calais; ?p rdfs:subPropertyOf transport" I can get various values for ?p: TGV; train, and transport. Are all the values available with nSPARQL, or just the yes/no answer that there is some such ?p?
Shangguan MetaQuery Presentation Joshua Taylor 1 Shangguan MetaQuery Presentation One of the design choices made by the authors is that their system employs Syntax extensions. While many authors and designers seem to follow the approach of extending systems rather than building extensions within a system, syntactical extensions present the problem that the users of the services must be aware of said extensions (whether this means end users, or programmers working with the system). In this case, there are extensions to the SPARQL query language. Do you think that the benefits of this system could be realized in a way that doesn't require syntactic extensions?
Summary Abox Ankesh Joshua Taylor 1 Summary Abox Ankesh 2 Summary Abox defines summary Aboxes, and proves Theorem 2 about summary Aboxes. For example, any Abox A is a summary Abox for itself (where the mapping f is simply the identify function). There is mention of a canonical function f satisfying the converses of (1) and (3), and satisfying (4–6), but I don't see any description of a procedure to compute this f. Can anything be said about the complexity (time, space, lower-bounds) needed to compute this canonical summary?
Summary Abox Ankesh Joshua Taylor 2 Summary Abox Ankesh 2 Summary Abox describes the properties of a canonical function f satisfying the converses of (1) and (3), and satisfying (4–6), and producing a summary Abox. Condition (6) is that "f(a) ≠ f(b) ∈ A’ implies a is the only individual in A mapped to f(a) (same for b)." This approach produces something very different than what model finders for, for example, first-order logic typically produce. The techniques that such model finders employ typically increase the domain size gradually, trying to find interpretations. Thus a model finder presented with ab and cd would probably find an interpretation with two domain of size two that mapped one name from each inequality to one domain element, and the other names to the other domain element (e.g., x = a = d, and y = b = c). More importantly, the results of such model finders, I think, could be considered summary Aboxes, though they would not be canonical summary Aboxes. How many of the techniques described in this paper could be applied to non-canonical summary Aboxes? (I recognize that in 3 Abox Filtering the authors write, "we described filtering techniques on the original Abox first," but this is just "for the purpose of exposition.")
Theoharis2008graph Joshua Taylor 1 Theoharis2008graph presented by Tim Lebo 4 dec 2008 When the authors analyze class hierarchies, they observe, to some extent, power law behavior. They point out that if class hierarchies were complete and balanced trees, then the power law behavior would be expected, and, on a related note, their analysis shows how actual class hierarchies differ from complete and balanced trees. Now, it might not be feasible (or possible) to perform such an experiment, but what might happen if some instance data were available, and the same analysis were performed on the class hierarchy with the exclusion of class which are not the most specific class of any (known) instance?
Theoharis2008graph Joshua Taylor 2 Theoharis2008graph presented by Tim Lebo 4 dec 2008 The analysis on the class hierarchy is, presumably, based on rdfs:subClassOf, as well as some OWL properties. (On page 1 they list owl:unionOf and owl:intersectionOf, but it's not clear whether these are the only two they consider.) One goal of the authors' is to automatically generate realistic schemata. If they generate a class hierarchy using just RDFS vocabulary, they will necessarily be generating a consistent ontology. If they use only RDFS, owl:unionOf, and owl:intersectionOf, they will generate a consistent ontology (I think). However, if they start to analyze and include, e.g., owl:disjointWith, they can begin to generate inconsistent ontologies. 1) Are they already performing analysis with the features of OWL that could express inconsistencies? 2) How might they avoid generating inconsistent ontologies (aside from simply checking the consistency of an ontology once it's been generated)?
Williams Khandelwal Summaries Joshua Taylor 1 Gregory Todd Williams Graph Summaries
Summary Abox Ankesh
This question is for two presentations, and maybe it can be discussed once both presentations have been given. The two papers both look at ways of summarizing a graph structure in order to quicken certain types of queries. I wonder if the presenters, or anyone else, has any thoughts about whether the techniques described in each of these papers might be applied to the problems described in the other, or whether some hybrid approach might be useful for some tasks.
Zhao2004using presented by Tim Lebo 9 oct 2008 Joshua Taylor 1 Zhao2004using presented by Tim Lebo 9 oct 2008 The authors give some reasons for using LSIDs as opposed to URLs, but I'm not sure that I understand their reasoning. The "perceived benefits" include: "1a) … [The] workflow system needs to pass raw data between services without any metadata annotations; 1b) Data and metadata can reside in different locations; 1c) Metadata may be attached to resources for which we have no control." These are all achievable using HTTP and traditional URLs. "2) An explicit social convention committment to maintaining immutable and permanent data." [emphasis added] If the convention is only social and not technical, why not create something like PURL for life science data, with the agreement that the resources at the other end of the permanent URLs will not change? Of course, they do make the point that there is significant LSID support (e.g., from IBM), so there must be something to it. What's so useful about LSIDs?
Personal tools
Semantic Web Community
Tetherless World constellation
maintenance