Changing Scientific Communication: On Stories, that Persuade with Data - Anita de Waard, PhD

About the Talk

Tuesday, April 5th, 2011, Winslow Building (105 8th Street)
Room 1140,
4-5 PM
To improve the identification of knowledge components within scholarly discourse, we can consider scientific papers to be, essentially, stories, that persuade with data [1]. In my talk, I’ll some recent work in computer science and linguistics that addresses these three components:

  • identifying narrative structure using story grammar (and fairy-tale) models, through initiatives such as AMICUS [2] and the W3C [3];
  • exploring the linguistic embodiment of scientific persuasion, by looking at analogies with sentiment detection; and
  • improving the relationship between discourse and research data through online workflow systems, such as [4] or [5]; see also [6].

In allowing access to smaller-grained knowledge entities, new forms of science publishing become possible, such as hypothesis-evidence networks, which allow readers to trace the heritage of a scientific claim [7]. The technology for these new models is available increasingly: at Elsevier, we are developing a Linked Data format for all full-text content, and semantic standards are enabling increasingly intimate modes of content integration. What, if anything, is stopping us from creating a wholesale revolution in science publishing?


[1] de Waard, A. “From Proteins to Fairytales: Directions in Semantic Publishing,“ IEEE Intelligent Systems 25(2): 83-88 (2010).





[6] de Waard, A. “The Future of the Journal? Integrating Research Data with Scientific Discourse,” LOGOS: The Journal of the World Book Community, Volume 21, Numbers 1-2, 2010 , pp. 7-11(5).

[7] de Waard, A., Buckingham Shum, S., Carusi, A., Park, J., Samwald, M., and Sándor, Á. (2009). “Hypotheses, Evidence and Relationships: The HypER Approach for Representing Scientific Knowledge Claims,” Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), co-located with the 8th Int’l Semantic Web Conference (ISWC-2009).


About Anita de Waard

Disruptive Technologies Director, Elsevier Labs, Burlington, VT


Anita de Waard is Disruptive Technologies Director at Elsevier Labs. She has a degree in experimental low-temperature physics from Leiden university, and worked at the Kapitza Institute in Moscow, before joining Elsevier as a physics publisher in 1988. Since 1997 she has worked on bridging the gap between science publishing and computational and information technologies, collaborating with different academic groups in Europe and the US. Her past work includes the application of Semantic Web technologies to scientific communication in the DOPE project, and the development of an Entity Identification database in the EU-funded OKKAM project. She developed and led the Elsevier Grand Challenge for Life Sciences and the ISMB 2010 Killer App Award, which both rewarded researchers for ideas pertaining to novel forms of science publishing. In 2006 and 2009, she was awarded the Reed-Elsevier award for Excellence in Innovation. Other projects include co-leading the W3C HCLS group on Scientific Discourse Structure, and co-organising a series of workshops with the goal of enunciating the key possibilities and main impediments to change scientific communications, including ‘Beyond the PDF’ in January 2011. From January 2006 onwards, de Waard has been working part-time as a researcher at the University of Utrecht, funded by a Casimir project grant by the Netherlands Organisation for Scientific Research. Her research focuses on discourse analysis of biological text, with an emphasis on finding key rhetorical components, offering possible applications in the fields of hypothesis detection and automated copy editing tools.

TitanPad Notes

Anita's Talk

Research in the discourse structure of scientific text (journal articles) - discourse analysis of argumentation - jumps out of text (referring to figures). Bio knowlesge occurs on two planes, data and discourse. In the sharing of the data, most of the data gets distroyed (e.g. PDF).

Clause - text with verb in it

Interesting, I was more interested in the connection between the document and the data, not so much the content of the document to the data. So linking a part of the document, where a claim is being made, to the data that backs up that claim, or disputes the claim. -- so does this make it more interesting or less interesting to you? Not clear from your statement. It was already interesting. This just makes it more interesting. How do you match a part of a document in an abstract way?

Semming from the "Beyond the PDF workshop" SMA - Marianne Martone from the Neuroinformtion Network is now in NY working on the SMA use case. I'm thinking about bringing in the pathway modeling. Rendered in CORAAL

Harvard's Annotation Ontology and Annotation Framework (a browser in a browser) (Tim Clark's group) - related to the work that James Michaelis is working on. Be interesting to looking into that. An annotation can be rejected, discussed, accepted. Information about the annotation, etc...

Phil Borne - can't keep track of data.

Data driven papers....

This is the idea of having an organization-wide mechanism for creating, storing, keeping track of, discovering, etc... any data that is generated by the organization. Instead of storing on a personal computer, on a lab server, etc... but a mechanism of storing and referencing this data campus wide. eLibrary.

Executable Paper Grand Challenge, ICC, June 1,1
Security nitemare

Beyond the PDF, Jan 2011 - connecting developers, librarians, publihers, scholars
All the presentations are available on-line

August 15-18, 2011 - Germany (more english)
Future of Scientific Communication

Where's the terminology going? Bio example - really consise stories, refer to figures 3 pages back, where are publishers thinking of levels of granularity.

Evidence is different in different disciplines.

In UK required to publish your negative results of clinical trials. No so in US.

Journal for the road not taken, and why

Microlinguistic protocol ( suggest not good enought for nature, for example, need to use show).

ASW Week 10


  • Peter (from Jersey)

Jin - converting lots of regulation data sets, usgs data about water, epa data about facility.
Ping enhancments to the converter
time in converting on the fly, now using triple store so more quickly

Will meet with Ping tomorrow, and discuss implementation and usage of provenance data.

Eric - will use widgets for faceted browsing - has structure for faceted search capabilities

Peter, Scott & Ping -

  • Peter out sick, but goals: put together drug data for accurate visualizations. Utilize drug databases to correlate/group similar drugs or drugs that treat the same illness together, and plot changes in dosages over time.