Archive

Archive for November, 2008

Rankings, Google and Semantic Web

November 12th, 2008

During the last centuries, humankind has experimented an exponential increase in the information available. This is even more perceptible in the Web, which makes information to be reached with just a few clicks….. far beyond what we as humans can process and assimilate.

Thus we need to discriminate among a gargantuan amount of information available to find wha are we looking for (or the closest to it). The traditional Information Retrieval idea for this is based on searching keywords. However, it is difficult to differentiate among several –potentially billions– of pages which has more useful information to what we are looking for. In order to do that, we need to discriminate. The general idea of discriminate is based on the concept of ranking(*): This is an order (whether partial or total) of some entities based on a set of criteria.

This is a good way of handling information because we don’t have the resources (time, memory, etc..) to navigate through all the available data. And that is exactly what Google does: we ask “I need to find some pages that contains keywords X, Y and Z” and Google answers “Look, according to my algorithm and the data I have here is a list order from what I think is the most relevant page for your query”.

The Semantic Web brings similar challenges, but in this case we are not talking about pages and links, but about any entity (people, cars, webpages, ontologies) related by different predicates (people has firends, cars has parts, webpages has authors, ontologies describe other entities and so on). Thus the problem is far more complex, since there is more information available.

Also, there are other questions we can ask: What ontology should I choose for a certain work, given dozens of possible candidates? When using that ontology if I have a SPARQL query that returns 1e6 results, are they all equally interesting to me? If not, which ones to show first?

The idea of opening your data, share it, mash it up, makes it everything more complex: It is not enough to have millions of answers, as a user I want the best suited for me (whatever that means).

Alvaro Graves

(*) Linguistic thought: Is interesting for a spanish native speaker as me that there is not translation for “ranking”: Does it means that the concept didn’t exist in the spanish-speaking world?

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: Semantic Web, Web Science Tags:

AAAI Fall Symposium on Automated Scientific Discovery

November 11th, 2008

Day One

I (Joshua Taylor) am now back in Troy after spending the weekend (Friday through Sunday) in Arlington at the AAAI Fall Symposium on Automated Scientific Discovery. Presented papers, keynote address, and slides will become available on the supplementary symposium page over the next week or so.

After opening remarks by symposium chairs Selmer Bringsjord and Andrew Shilliday, Doug Lenat gave an opening keynote address entitled Looking Both Ways. Doug has a long history in automated discovery, from his 1976 PhD thesis on Automated Mathemetician (AM) and later Eurisko, to present day work within Cycorp. Doug explained a great deal about the techniques behind AM and Eurisko, and also talked about some of the criticisms that these systems received. He then spoke about the development of Cyc. We learned just how much Cyc has evolved, moving from frame-based systems and description logics to much more expressive formalism, leaving behind a theoretically desirable global consistency for more pragmatic and cognitively plausible locally consistent microtheories, and how Cyc now has enough (manually-encoded) knowledge, that high-level machine learning and automated knowledge acquisition are possible. He stressed that one of the factors making such knowledge acquisition and learning possible is the widespread adoption of the world wide web, and I noted that in some of his high level diagrams included SQL and SPARQL. (While I knew about the OpenCyc project, I only just now became aware that a great deal of OpenCyc is Semantic Web friendly.)

Andrew Shilliday followed Doug’s keynote with a history of automated scientific discovery, particularly as it relates to his own research and upcoming thesis. He also described the Elisa system for assisting users is discovery in scientific and mathematical domains.

After lunch, Alexandre Linhares discussed Douglas Hofstadter’s notion of Fluid Concepts and an implementation thereof.

Siemion Fajtlowicz spoke about development of Graffiti, a well-known system for conjecture generation within the domain of graph theory. Siemion also discussed more recent work connecting graph theory conjectures and molecular structure conjectures.

Jean-Gabriel Ganascia presented A Reconstruction of Some of Claude Bernard’s Scientific Steps, which also documents the development of Cybernard. In a manner that I am particularly fond of, Jean-Gabriel, in order to automate scientific discovery, takes as a starting point Claude Bernard, a human who both made many scientific discoveries, and also documented just what he did. One of the interesting aspects of this work is that it involves developing an ontologies of the scientific process of discovery, as well as of the scientific concepts with which Bernard worked. The importance of ontology evolution was also stressed, for as scientific knowledge increases, scientific conceptualization must also change.

After Jean-Gabriel, Susan Epstein discussed Knowledge Representation in Automated Scientific Discovery. She discussed how concepts in automated scientific discovery are often expressed as sets, and that as a result, the conjectures that are generated are usually those that can be expressed in set theoretic terms. For instance (and I’m choosing an example that I remember from Doug Lenat’s talk), the conjecture that perfect squares have at least three divisors (every perfect square n has as divisors at least 1, n, and the root of n), could be made based upon the observation that the set of perfect squares is a subset of the set of numbers with at least three divisors. She proposed a representation, different from sets, that uses testers and generators, which are, respectively, predicates for determining whether an example is an example of a concept and functions that produce examples of a concept.

Epstein’s talk concluded with an example of student discovery and conceptual refinement for the game Pong Hau K’i (or Umulkono, 우물고노, in Korea). As AI researchers, seeing the progression of formalizations that students went through in diagramming the state space of a game was quite interesting. (A colleague and I played the game on paper during the plenary session. I lost.)

Day Two

Alan Bundy started the day with a keynote called Why Ontology Evolution is Essential in Modeling Scientific Discovery. The title is a good overview, and some of the comments made about Jean-Gabriel’s work apply here too. The importance of the provision for ontologies which can change and evolve along with scientific conceptualization must not be underestimated. Examples were drawn from physics and astronomy, particularly the discovery that heat and temperature are not equivalent, or the precession of the perihelion of Mercury. Michael Chan’s later talk would also touch upon their work on ontology evolving and repair systems.

David Jensen spoke about Automatic Identification of Quasi-Experimental Designs for Discovering Causal Knowledge. I think that this work is important, particularly for science performed using Semantic Web technologies, where, ideally, data collection could be automated rather than planned. From his abstract, [Quasi-Experimental Designs] are a family of methods for exploiting fortuitous situations in observational data that emulate control and randomization.. For instance, although a dataset as a whole may have significant bias or sampling issues, subsets of the data may actually suggest another, and with better experimental method. Jensen discussed an example in which two groups of researchers made different conclusions about links between early sexual activity and juvenile delinquency. The first group of researchers used the dataset as a whole, while the second examined twins in the dataset, a subset of the observational data, but one in which it was practically guaranteed that most variables would be identical (e.g., twins in the same household, same type of family life, &c.)

The posted schedule had Konstantine Arkoudas speaking next, but he and I switched places, so I gave the next presentation, Discovery Using Heterogeneous Combined Logics. The title was a bit misleading (though it matched the extended abstract that I submitted) as I actually spoke about how specialized reasoners might be invoked on decidable subproblems of an overall goal. Particularly, I showed the Dreadsbury Mansion Mystery and the typical first-order logic formalization that students at RPI will generate for it. Of course, with an arbitrary FOL formalization, there aren’t many guarantees about what an automated reasoning system can do with it. I then showed that there’s a natural translation in the description logic ALBO which has a decision procedure. I have not yet done any work in automating this process, but neither does it seem impossible to automatically recognize such a reduction. This is all preliminary work, so I was grateful for references from Alan Bundy and Simon Colton to related work.

After lunch, Michael Chan presented more work on ontology repair. It seems that their research group has a framework in which ontology repair plans are components. Chan discussed one called Inconstancy, in addition to Where’s My Stuff? that Alan Bundy had mentioned earlier.

Selmer Bringsjord showed his continuing work on an automated discovery of Gödel’s first incompleteness result.

Day Three

Day three began with a keynote from Simon Colton called Joined-Up Reasoning for Automated Scientific Discovery: A Position Statement and Research Agenda. Simon discussed how HR. Doug Lenat had to leave the symposium early, which is unfortunate, as Simon’s presentation made some comparisons between AM and HR. Simon also made some connections between scientific discovery and artistic creativity. By this time, the symposium was drawing to a close, and so there was not as much time as we would have liked, but the presentation slides, and perhaps an audio recording, should be available on the supplementary symposium site relatively soon.

The symposium ended with a practical discussion of funding, publications, and possible future work and collaborations.

//JT

VN:F [1.9.22_1171]
Rating: 4.0/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: AI, Uncategorized Tags: