Jim Hendler and Helen Margetts
I (Joshua Taylor) am now back in Troy after spending the weekend (Friday through Sunday) in Arlington at the AAAI Fall Symposium on Automated Scientific Discovery. Presented papers, keynote address, and slides will become available on the supplementary symposium page over the next week or so.
After opening remarks by symposium chairs Selmer Bringsjord and Andrew Shilliday, Doug Lenat gave an opening keynote address entitled Looking Both Ways. Doug has a long history in automated discovery, from his 1976 PhD thesis on Automated Mathemetician (AM) and later Eurisko, to present day work within Cycorp. Doug explained a great deal about the techniques behind AM and Eurisko, and also talked about some of the criticisms that these systems received. He then spoke about the development of Cyc. We learned just how much Cyc has evolved, moving from frame-based systems and description logics to much more expressive formalism, leaving behind a theoretically desirable global consistency for more pragmatic and cognitively plausible locally consistent microtheories, and how Cyc now has enough (manually-encoded) knowledge, that high-level machine learning and automated knowledge acquisition are possible. He stressed that one of the factors making such knowledge acquisition and learning possible is the widespread adoption of the world wide web, and I noted that in some of his high level diagrams included SQL and SPARQL. (While I knew about the OpenCyc project, I only just now became aware that a great deal of OpenCyc is Semantic Web friendly.)
Andrew Shilliday followed Doug’s keynote with a history of automated scientific discovery, particularly as it relates to his own research and upcoming thesis. He also described the Elisa system for assisting users is discovery in scientific and mathematical domains.
After lunch, Alexandre Linhares discussed Douglas Hofstadter’s notion of Fluid Concepts and an implementation thereof.
Siemion Fajtlowicz spoke about development of Graffiti, a well-known system for conjecture generation within the domain of graph theory. Siemion also discussed more recent work connecting graph theory conjectures and molecular structure conjectures.
Jean-Gabriel Ganascia presented A Reconstruction of Some of Claude Bernard’s Scientific Steps, which also documents the development of Cybernard. In a manner that I am particularly fond of, Jean-Gabriel, in order to automate scientific discovery, takes as a starting point Claude Bernard, a human who both made many scientific discoveries, and also documented just what he did. One of the interesting aspects of this work is that it involves developing an ontologies of the scientific process of discovery, as well as of the scientific concepts with which Bernard worked. The importance of ontology evolution was also stressed, for as scientific knowledge increases, scientific conceptualization must also change.
After Jean-Gabriel, Susan Epstein discussed Knowledge Representation in Automated Scientific Discovery. She discussed how concepts in automated scientific discovery are often expressed as sets, and that as a result, the conjectures that are generated are usually those that can be expressed in set theoretic terms. For instance (and I’m choosing an example that I remember from Doug Lenat’s talk), the conjecture that perfect squares have at least three divisors (every perfect square
n has as divisors at least
n, and the root of
n), could be made based upon the observation that the set of perfect squares is a subset of the set of numbers with at least three divisors. She proposed a representation, different from sets, that uses testers and generators, which are, respectively, predicates for determining whether an example is an example of a concept and functions that produce examples of a concept.
Epstein’s talk concluded with an example of student discovery and conceptual refinement for the game Pong Hau K’i (or Umulkono, 우물고노, in Korea). As AI researchers, seeing the progression of formalizations that students went through in diagramming the state space of a game was quite interesting. (A colleague and I played the game on paper during the plenary session. I lost.)
Alan Bundy started the day with a keynote called Why Ontology Evolution is Essential in Modeling Scientific Discovery. The title is a good overview, and some of the comments made about Jean-Gabriel’s work apply here too. The importance of the provision for ontologies which can change and evolve along with scientific conceptualization must not be underestimated. Examples were drawn from physics and astronomy, particularly the discovery that heat and temperature are not equivalent, or the precession of the perihelion of Mercury. Michael Chan’s later talk would also touch upon their work on ontology evolving and repair systems.
David Jensen spoke about Automatic Identification of Quasi-Experimental Designs for Discovering Causal Knowledge. I think that this work is important, particularly for science performed using Semantic Web technologies, where, ideally, data collection could be automated rather than planned. From his abstract,
[Quasi-Experimental Designs] are a family of methods for exploiting fortuitous situations in observational data that emulate control and randomization.. For instance, although a dataset as a whole may have significant bias or sampling issues, subsets of the data may actually suggest another, and with better experimental method. Jensen discussed an example in which two groups of researchers made different conclusions about links between early sexual activity and juvenile delinquency. The first group of researchers used the dataset as a whole, while the second examined twins in the dataset, a subset of the observational data, but one in which it was practically guaranteed that most variables would be identical (e.g., twins in the same household, same type of family life, &c.)
The posted schedule had Konstantine Arkoudas speaking next, but he and I switched places, so I gave the next presentation, Discovery Using Heterogeneous Combined Logics. The title was a bit misleading (though it matched the extended abstract that I submitted) as I actually spoke about how specialized reasoners might be invoked on decidable subproblems of an overall goal. Particularly, I showed the Dreadsbury Mansion Mystery and the typical first-order logic formalization that students at RPI will generate for it. Of course, with an arbitrary FOL formalization, there aren’t many guarantees about what an automated reasoning system can do with it. I then showed that there’s a natural translation in the description logic ALBO which has a decision procedure. I have not yet done any work in automating this process, but neither does it seem impossible to automatically recognize such a reduction. This is all preliminary work, so I was grateful for references from Alan Bundy and Simon Colton to related work.
After lunch, Michael Chan presented more work on ontology repair. It seems that their research group has a framework in which ontology repair plans are components. Chan discussed one called
Inconstancy, in addition to
Where’s My Stuff? that Alan Bundy had mentioned earlier.
Selmer Bringsjord showed his continuing work on an automated discovery of Gödel’s first incompleteness result.
Day three began with a keynote from Simon Colton called Joined-Up Reasoning for Automated Scientific Discovery: A Position Statement and Research Agenda. Simon discussed how HR. Doug Lenat had to leave the symposium early, which is unfortunate, as Simon’s presentation made some comparisons between AM and HR. Simon also made some connections between scientific discovery and artistic creativity. By this time, the symposium was drawing to a close, and so there was not as much time as we would have liked, but the presentation slides, and perhaps an audio recording, should be available on the supplementary symposium site relatively soon.
The symposium ended with a practical discussion of funding, publications, and possible future work and collaborations.
From Talis: “Jim Hendler at the INSEMTIVE 2008 Workshop”
“that people will (and do) create metadata when there are obvious and immediate benefits in them doing so. No-one really consciously sits down to share or create metadata: they sit down to do a specific task and metadata drops out as a side-effect.”
I can not agree any more. I have tried to tag all my blogs once upon a time, after a few weeks, I found myself bored because there is no clear, immediate benefits for doing so. I would only tag things that I have to, like to tell my friends a list of posts of the same topic.
The only tagging system that is consistently successful upon me is the gmail labeling: I organize mails related to the same task (like writing a paper) on daily bases, because it is very useful, and immediately useful. Even though, I only label a tiny fragment of all my emails.
I have seen too many people have their desktop full of files and too lazy to organize them – myself is one of them. Every year I have to spare a day or two to reorganize my harddisk, and dig out the hidden treasures of my “Downloads” folder. I believe for semantic web to be successful, creating an ontology should be at least as easy as and as useful as organizing files on a harddisk.
In fact, people are creating meta data or even ontology everyday: every email sorting, every contact on the cell phone, every folder creating, every calender item, every wiki post, … We just need to make them explicit, and most of all, without bothering the user to click even one more button.
Clark and Parsia’s suggestion to periodically write up ideas and share them seems like a good one, so I thought I’d try as well. These days I seem to be involved in the creation of new academic fields – such as Web Science (cf. http://webscience.org and the Web Science conference) – and this one sort of popped into my head in response to a call for “innovative engineering” areas for NSF to consider. Have at it…
Information Systems Engineering
Subdisciplines of engineering have developed as new materials or techniques have been discovered that had potential revolutionary impact on society. For example, in the early 1800s, the transition from electricity as a curiosity to a commodity was made possible by the emergence of electrical engineering; radiation went from a parlor trick to a world-changing force thanks to nuclear engineering; and the advent of computing machines in the mid 1900s led to the need for the modern computer engineering department.
In the late 1900s and early 2000s, the analogous revolutionary medium is the information that powers modern computer applications, engineering simulations, sensor networks, and the World Wide Web. In the 1980s, there was a move to create a field of “information engineering,”  but it primarily led to the development of business processes around databases, as at the time information was seen as locked into a single application or process. With the growth of the Web, however, there has been an increasing modern awareness that the information that arises from the interaction of the more than 1 billion Web users needs to be understood at a new level [2,3].
As with electricity or radiation, this new substance has mostly been studied “in the small” and there is a clear need for a much more significant understanding if we are to determine the principles of its use. For example, as science becomes increasingly internationalized and large scale (cf the Large Hadron Collider), we must being to understand how the data produced can be processed into usable information, and how that information can be stored, scaled, combined, and exchanged between systems. Just as we needed to learn to move from ad hoc Voltaic cells to develop the electrical engineering principles that drive our modern world, we must learn how to engineer tomorrow’s large scale, and collaborative, information systems.
The principles of information systems engineering will be those that will drive key parts of our future society. Intelligent vehicle systems will be crucially dependent on the ability of individual vehicles to communicate with each other and with central controllers; sensor networks will become far more useful as we learn to make them dynamically reconfigurable to the needs of applications; educational systems must be able to derive their power from cutting-edge, real-world systems, like the LHC, rather than from toy simulations that miss the key properties. As the Web grows, the information needs of society will increase exponentially, and techniques that make the modern search engine look like a toy will be needed if it is to be of use. Mobile computing, large scale information design, cloud-based software, and many other applications also will need a more principled understanding of information and its flows – the challenges are incredible, and the potential amazing.
Currently, information has been primarily viewed as the providence of database systems, and while these are an important part of the story, they are no means sufficient. The III program at NSF (CISE/IIS) is exploring some parts of information use, but primarily from an algorithmic, rather than engineering, perspective. Much as computer design moved from computer science to computer engineering, and much as current ECSE departments study vision, robotics and other topics jointly under sponsorship between parts of ENG and parts of CISE, so too must the development of a modern engineering discipline for information flows and large scale information-based systems.
ENG and CISE, working together, have the potential to create the joint engineering teams, needed to develop a principled approach to the design of large-scale information-systems. This is an important area, as a firm and fundamental engineering approach is needed for the development of the large-scale applications that have become so critical to our modern world.
 Finkelstein, Clive. 1989. “An Introduction to Information Engineering : From Strategic Planning to Information Systems”. Sydney: Addison-Wesley.
 T. Berners-Lee, W. Hall, J. Hendler, N. Shadbolt and D. Weitzner, 2006, Creating a Science of the Web, Science, 311.
 J. Hendler, W. Hall, N. Shadbolt, T. Berners-Lee and D. Weitzner, Web Science: An interdisciplinary approach to understanding the World Wide Web, Communications of the ACM, July, 2008
For more information, please see http://www.websci09.org
WebSci’09: Society On-Line
18th–20th March, 2009
Web Science focuses on understanding, designing and developing the technologies and applications that make up the World Wide Web. But the WWW does not exist without the participation of people and organizations. Now that a significant proportion of everyday life is spent on-line in many countries, it makes sense for the first Web Science conference organised by the Web Science Research Initiative (WSRI) and the Foundation of the Hellenic World (FHW) to be dedicated to the presentation of research into society on the Web. How do people and organisations behave on-line – what motivates them to shop, date, make friends, participate in political life or manage their health or tax on-line? Which Web-based designs will they trust? To which on-line agents will they delegate? How can the dark side of the Web – such as cybercrime, pornography and terrorist networks – be both understood and held in check without compromising the experience of others? What are the effects of varying characteristics of Web-based technologies – such as security, privacy, network structure, the linking of data – on on-line behaviour, both criminal and non-criminal? And how can the design of the Web of the future ensure that a system on which – as Tim Berners-Lee put it – democracy and commerce depends remain ‘stable and pro-human’?
Such a challenge requires understanding of both human behaviour and technological design. So the science – including the social science – of the Web is a field that requires the attention of both computer scientists and social scientists. The aim of this conference is to bring these two groups together across the disciplinary divide for perhaps the first time, exploring the development of the Web across different areas of everyday life and technological development. We welcome papers from a wide range of disciplinary perspectives, including computer science, physics, economics, political science, sociology, geography, management, health. Papers which incorporate more than one discipline will be particularly welcomed. We have identified the following areas of on-line society and Web development for particular attention:
- Government and Political Life
- Social Relationships
- Cybercrime and/or the Prevention Thereof
- Culture On-Line
In addition, we are interested in papers that concern the cross-cutting infrastructure issues on which these areas depend including, but not limited to:
- Linked Data and the Semantic Web
- Trust and Reputation
- Security and Privacy
- Networking (Social and Technical)
Submissions should take the form of extended abstracts, no more than two pages in length, which will be peer-reviewed by the Conference Committee of Computer Scientists and Social Scientists. Please suggest which of the ‘society’ or ‘technology’ themes apply best to your work. The submission deadline is 31st October 2008 (details to follow). Successful applicants will be asked to produce a short paper of 5,000 words to be presented at the conference in a plenary session, panel or poster. These papers will automatically be considered for publication as full papers by a number of journals whose editors have agreed to participate.