Converting governmental datasets into linked data

Linked Data provide many benefits to data consumers, but many publicly available datasets are still released in the Comma Separated Values (CSV) format, a ubiquitous common denominator. We introduce a methodology to transform such datasets into Linked Data. Our design is based on requirements identified while surveying existing governmental datasets released by data.gov. We present an implementation-independent RDF vocabulary to describe how a CSV dataset should be promoted into Linked Data, and use a Java-based converter to produce 5.3 billion RDF triples from 312 data.gov datasets.

View Publication

Associated Projects

The LOGD project investigates the role of Semantic Web technologies, especially Linked Data, in producing, enhancing and utilizing government data published on Data.gov and other websites. Large portion of government data published on the Web are not necessarily ready for mashups. The Tetherless World Constellation (TWC) is now publishing over 8 billions RDF triples converted from hundreds of government-related datasets from Data.gov and other sources (e.g.

Citation