A Modular Framework for Transforming Structured Data into HTML with Machine Readable Annotations

Printer-friendly version

Presented at the AGU Fall Meeting 2010

Abstract:

There is a plethora of web-based Content Management Systems (CMS) available for maintaining projects and data, i.a. However, each system varies in its capabilities and often content is stored separately and accessed via non-uniform web interfaces. Moving from one CMS to another (e.g., MediaWiki to Drupal) can be cumbersome, especially if a large quantity of data must be adapted to the new system. To standardize the creation, display, management, and sharing of project information, we have assembled a framework that uses existing web technologies to transform data provided by any service that supports the SPARQL Protocol and RDF Query Language (SPARQL) queries into HTML fragments, allowing it to be embedded in any existing website. The framework utilizes a two-tier XML Stylesheet Transformation (XSLT) that uses existing ontologies (e.g., Friend-of-a-Friend, Dublin Core) to interpret query results and render them as HTML documents. These ontologies can be used in conjunction with custom ontologies suited to individual needs (e.g., domain-specific ontologies for describing data records). Furthermore, this transformation process encodes machine-readable annotations, namely, the Resource Description Framework in attributes (RDFa), into the resulting HTML, so that capable parsers and search engines can extract the relationships between entities (e.g, people, organizations, datasets). To facilitate editing of content, the framework provides a web-based form system, mapping each query to a dynamically generated form that can be used to modify and create entities, while keeping the native data store up-to-date. This open framework makes it easy to duplicate data across many different sites, allowing researchers to distribute their data in many different online forums. In this presentation we will outline the structure of queries and the stylesheets used to transform them, followed by a brief walk through that follows the data from storage to human- and machine-accessible web page. We conclude with a discussion on content caching and steps toward performing queries across multiple domains.

History

DateCreated ByLink
December 11, 2011
22:21:59
Patrick WestDownload

Related Projects:

TW LogoTW Website Project
Description: A semantically-powered Tetherless World Website running in the Drupal CMS. This combines many web standard technologies, including RDF, SPARQL, XSLT, and XHTML.

Related Research Areas:

Future Web
Lead Professor: Jim Hendler
Description: Since its inception the World Wide Web has changed the ways people work, play, communicate, collaborate, and educate. There is, however, a growing realization among researchers across a number of disciplines that without new research aimed at understanding the current, evolving and potential Web, we may be missing or delaying opportunities for new and revolutionary capabilities. To model the Web, it is necessary to understand the architectural principles that have provided for its growth. Looking into the future, to be sure that it supports the basic social values of trustworthiness, personal control over information, and respect for social boundaries, a research agenda must be pursued that targets the Web and its use as a primary focus of attention. This research requires powerful scientific and mathematical techniques from many disciplines to explore the modeling of the Web from network- and information- centric views.
Concepts: Semantic Web
Web Science
Lead Professor: Jim Hendler, Deborah L. McGuinness
Description: Web Science is the study of the World Wide Web and its impact on both society and technology, positioning the Web as an object of scientific study unto itself. Web Science recognizes the Web as a transformational, disruptive technology; its practitioners study the Web, its components, facets and characteristics. Ultimately, Web Science is about understanding the Web and anticipating how it might evolve in the future.
Concepts: Semantic Web