Home > linked data > Current Issues in data.gov

Current Issues in data.gov

July 31st, 2009

While translating data.gov data into RDF, we have discovered some issues with the published datasets. These issues can be roughly categorized as follows:

  • Duplicated Datasets– Some datasets are part of another dataset, e.g. Dataset 140 (2005 Toxics Release Inventory data for the state of California (Environmental Protection Agency)) is a subset of Dataset 191 (2005 Toxics Release Inventory National data file of all US States and Territories (Environmental Protection Agency)).
  • Formatting Issues – The format of some datasets is not friendly to machine processing. Not all datasets offer CSV format data, and parsing table data from them requires non-trivial efforts. Example: Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)). Some websites, meanwhile, have no data at all: Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)), for example, tells you how to order data from the government.
  • screen shot of the text file from dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases) by US Bureau of Reclamation

  • Access Point Issues – The access points for some datasets do not point to pages friendly to machine access. Instead of pointing to a downloadable file covering the entire dataset, some lead to an interactive website where only partial data can be returned by a web-based query. Example: Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics)) and Dataset 96 (National Water Information System (NWIS) (US Geological Survey)).

    screen shot of the query interface for accessing dataset 330 (Local Area Unemployment Statistics) by US Bureau of Labor Statistics

For more details, please visit http://data-gov.tw.rpi.edu/wiki/Current_Issues_in_data.gov .

Sarah Magidson, Li Ding, Dominic DiFranzo, and Jim Hendler

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: linked data Tags:
  1. August 2nd, 2009 at 12:47 | #1

    Yesterday, August 1, I declared as “DataIndependenceDay”, following the Swiss some 700 years ago.
    We published ontologies for oegov at http://www.oegov.us/blog. These are OWL ontologies for the organizational structure of government, FEA (coming soon) and QUDT, Quantities, Units, Dimensions and Data Types (coming soon). Data quality is a huge issue and QUDT is needed to help that.

    I have made some properties for connecting to open data – who publishes what on what – something you are doing. Would like to understand how to make connections to your work, where our efforts can align in the interests of making semantic web technologies be more in awareness, appreciated and adopted.

    In appreciation of your work, Ralph

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  1. March 16th, 2010 at 23:44 | #1