Archive

Posts Tagged ‘data-gov’

AAAI 2011 Fall Symposium on Open Government Knowledge, This weekend (Nov 4-6), Washington DC

November 1st, 2011

————————————————————————————————
Title:  Open Government Knowledge: AI Opportunities and Challenges
When:  4-6 November 2011
Where:  Westin Arlington Gateway in Arlington, Virginia, USA
Homepage: http://tw.rpi.edu/ogk2011
Program (PDF): http://tw.rpi.edu/media/latest/ogk2011.pdf
————————————————————————————————

Please join us to meet the thought governmental and business leaders in
US open government data activities, and discuss the challenges. The
symposium features Friday (Nov 4) as governmental day with speakers on
Data.gov, openEi.org, open gov data activities in NIH/NCI, NASA. and
Saturday (Nov 5) as R&D day with speakers from industry such as Google
and Microsoft, as well international researchers.

This symposium will explore how AI technologies such as the Semantic Web,
information extraction, statistical analysis and machine learning, can be used
to make the valuable knowledge embedded in open government data more
explicit, accessible and reusable.

Co-Chairs
* Li Ding, Qualcomm (Previously RPI)
* Tim Finin, UMBC
* Lalana Kagal, MIT
* Deborah McGuinness, RPI

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Data.gov – it’s useful, but also could be better.

April 5th, 2011

The “Nerd Collider” Web site invited me to be a “power nerd” and respond to the question “What would you change about Data.gov to get more people to care?”  The whole discussion including my response can be found here.  However, I hope people won’t mind my reprinting my response here, as the TWC blog gets aggregated to some important Linked Data/Semantic Web sites.

My response:

I was puzzling over how I wanted to respond until I saw the blog in the Guardian – http://www.guardian.co.uk/news/datablog/2011/apr/05/data-gov-crisis-obama – which also reflects this flat line as a failure, and poses, by contrast, the number of hits the Guardian.com website gets. This is such a massive apples vs. oranges error that I figure I should start there.

So, primarily, let’s think about what visits to a web page are about — for the Guardian, they are lots of people coming to read the different articles each day. However, for data.gov, there isn’t lot of repeat traffic – the data feeds are updated on a relatively slow basis, and once you’ve downloaded some, you don’t have to go back for weeks or months until the next update. Further, for some of the rapidly changing data, like the earthquake data, there are RSS feeds so once setup, one doesn’t return to the site. So my question is, are we looking at the right number?

In fact, the answer is no — if you want to see the real use of data.gov, take a look at the chart at http://www.data.gov/metric/visitorstats/monthlyredirecttrend — the number of total downloads of dataset since 2009 is well over 1,000,000 and in February of this year (the most recent data available) there were over 100,000 downloads — so the 10k number appears to be tracking the wrong thing – the data is being downloaded and that implies it is being used!!

Could we do better? Yes, very much so. Here’s things I’m interested in seeing (and working with the data.gov team to make available)

1 – Searching for data on the site is tough — keyword search is not a good way to look for data (for lots of reasons) and thus we need better ways – doing this really well is a research task I’ve got some PhD students working on, but doing better than is there requires some better metadata and approach. There is already work afoot at data.gov (assuming funding continues) to improve this significantly.

2 – Tools for using the data, and particularly for mashing it up, need to be more easily used and more widely available. My group makes a lot of info and tools available at http://logd.tw.rpi.edu – but a lot more is needed. This is where the developer community could really help.

3 – Tools to support community efforts (see the comment by Danielle Gould to this effect) are crucial – she says it better than I can so go read that.

4- there are efforts by data.gov to create communities – these are hard to get going, but could be a great value in the long run. I suggest people look to these at the data.gov communities site, and think about how they could be improved to bring more use – I know the data.gov leadership team would love to get some good comments about that.

5 – We need to find ways to turn the data release into a “conversation” between government and users. I have discussed this with Vivek Kundra numerous times and he is a strong proponent (and we have thought about writing a paper on the subject if time ever allows). The British data.gov.uk site has some interesting ideas along this line, based on open streetmap and similar projects, but I think one could do better. This is the real opportunity for “government 2.0″ – a chance for citizens to comment just on legislation, but to help make sure the data that informs the policy decisions is the best it can be.

So, to summarize, there are things we can do to improve things, many of which are getting done. However, the numbers in the graph above are misleading, and don’t really reflect the true usage of data.gov per se, let alone the other sites and sites like the LOGD site I mention above which are powered by data.gov.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

RPI Hackathon: Linking government data

December 9th, 2009

This is an invitation to participate in the RPI Hackathon 2009 for linking government data. For more detailed information check our wiki.

Part of the work done here in the Tetherless World Constellation consists in translating the government datasets available from data.gov into RDF. This effort has produced billions of triples from (at the moment of writing this post) more than 130 datasets. This data can used in multiple ways: It can be queried from a SPARQL endpoint, used in visualizations such as maps or it can be combined with other datasets (whether from data.gov or other sources) to find correlations, clustering or other types of analysis.

However, we think that the data is more interesting and useful when is linked: For example, a system can answer a specific query and also suggest other sources of information that may be relevant to the user. Thus we think that while we keep translating datasets, it also would be nice to link these datasets to the Linked Data cloud and, in order to do that, we are asking your help.

During December 12th and 13th we will host a Hackathon (i.e., an event where people gather together to work on a specific computational problem). This event is part of the Great American Hackathon promoted by Sunlight Labs. We will host this event at Winslow Building, RPI, in Troy NY. It will start from 10AM to 5PM , but if you have only a few spare hours, you are also welcome! As I mentioned above, our main goal is to link the available data to the Linked Data cloud, but if you have also other ideas to develop using one or more of the datasets, please join us too! The only requirement is to bring your computer and register by email to gravea3[@]rpi.edu or difrad[@]rpi.edu. Because we know big brains needs energy, food and beverages will be provided. Even if you can’t attend physically you can help us working online.

Everyone is invited to participate. If you have any comments, questions, etc. please don’t hesitate to contact me at gravea3[@]rpi.edu or check the announcement in data-gov.

Alvaro Graves and the Data-gov team.

VN:F [1.9.22_1171]
Rating: 8.3/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)