Data.gov Datasets Translated in RDF!
We have created 16 RDF datasets covering 187 of the datasets published at data.gov (171 EPA datasets are subsets of three larger EPA datasets). The original datasets were published by EPA, US Census Bureau, USGS and Office of Management and Budget in CSV compatible format, and they contributed 13,532,250 table entries. The translated RDF datasets includes a total of 2,927,398,352 triples involving 2,526 properties.
We publish the RDF data in two alternative ways: (i) a collection of linked partition files in RDF/XML for users to browse the dataset and dereference the URIs using semantic web browsers, and (ii) one big N-TRIPLE file (data.nt) concatenating the partition files for machines, especially triple stores, to download and import. The largest dataset is Dataset_91, which contributed 2.11 billion triples.
To access the RDF datasets, users may go to Data.gov_Catalog with the following options:
- follow links in the “rdf(index file)” column to access the index file in RDF/XML which contains the property list, statistics, and links of the RDF dataset. e.g. http://data-gov.tw.rpi.edu/raw/401/index.rdf
- follow links in the “rdf(partition files)” column to start an RDF browser (e.g. tabulator) to surf the RDF/XML partition files. e.g. http://data-gov.tw.rpi.edu/raw/401/link00001.rdf
- follow links in “the rdf(complete file)” column to download the complete RDF dataset in N-TRIPLE format (gzipped). e.g. http://data-gov.tw.rpi.edu/raw/401/data-401.nt.gz
- follow links in the “url(data.gov)” column to see the original metadata at data.gov
- follow links in the “wiki page” column to see enhanced metadata about data.gov datasets
More datasets are coming, so please stay tuned and come back to http://data-gov.tw.rpi.edu/.
Further reading:
- To learn how we managed the translation, please go to Generating RDF from data.gov
- To learn more translation statistics, please to go What’s in data.gov
Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler