TWC data-gov corpus: incrementally generating linked government data from data.gov

The Open Government Directive is making US government data available via websites such as Data.gov for public access. In this paper, we present a Semantic Web based approach that incrementally generates Linked Government Data (LGD) for the US government. In focusing on the tradeoff between high quality LGD generation (requiring non-trivial human expert input) and massive LGD generation (requiring low human processing cost), our work is highlighted by the following features: (i) supporting low-cost and extensible LGD publishing for massive government data; (ii) using Social Semantic Web (Web3.0) technologies to incrementally enhance published LGD via crowdsourcing, and (iii) facilitating mashups by declaratively reusing cross-dataset mappings which usually are hardcoded in applications.

View Publication

Associated Projects

The LOGD project investigates the role of Semantic Web technologies, especially Linked Data, in producing, enhancing and utilizing government data published on Data.gov and other websites. Large portion of government data published on the Web are not necessarily ready for mashups. The Tetherless World Constellation (TWC) is now publishing over 8 billions RDF triples converted from hundreds of government-related datasets from Data.gov and other sources (e.g.

Citation