Archive

Archive for the ‘net neutrality’ Category

Open Source Software & Science Reproducibility

January 14th, 2014

This year my contribution to the AGU fall meeting 2013 was all about the development of Open Source Software to enable the reproducibility of scientific products, with both a Poster and an Oral presentation. The AGU was the perfect opportunity to share my ideas on a topic that is one of my main interests.

This was my 2nd time at AGU, but my first time with an oral presentation which turned in a real challenge!

The main issue was a combination of 2 factors : I had decided to generate the slideshow in realtime as HTML from an online IPython Notebook. I thought it would be cool to show this functionality, as well as the work itself. Unfortunately, I was dependent on an internet connection at the time of the presentation, but alas, at AGU the presenter computer doesn’t have internet connection! Definitely not the best conditions for a web based slideshow generated “on-the-fly” by the execution of an IPython Notebook.

I found out about the lack of connectivity only 2 days before my presentation. I must have misunderstood the AGU oral presentation guidelines, but when I didn’t find an explicit mention of the lack of an internet connection, I took it for granted that that wouldn’t be an issue. Big mistake!

I decided it would be safer to prepare a power-point presentation, and some time later, I had one. Deep breath; I would be safe. But… what a disappointment !

I was so excited about the idea of showing my work running in realtime instead of showing a static (somewhat boring) ppt  presentation!!!

I kept thinking about alternative solutions, though, and an idea quickly came to me. If the lack of internet stands in the way of an interactive, realtime demo there should be no problem in running a static HTML slideshows instead; at least that is what I thought …

I used the IPython “nbconvert” utility and its “convert to slide” option, and I successfully converted my workflow from an interactive IPython notebook running in slideshow mode to a static HTML5 slideshows, yeah! The audience wouldn’t get to see how this was done, but at least they would get to see the result.

Happy with the final HTML presentation I finally went to the “AGU’s Speaker Ready Room” to upload and test my presentation. Unfortunately, my HTML presentation would not run offline. The lack of internet was giving me troubles with missing JavaScript files, missing fonts, images-urls to be replaced with path to static files, broken hyperlinks etc … it was not as easy as I thought.

It took more than 3 hours to fix all the bugs on account of a really slow internet connection running from my phone, but finally i got my presentation perfectly  running off line on the AGU computers !

In the end, my talk ran very smoothly. A complete workflow for “catchments characterization” using exclusively open source software, running online and fully reproducible thanks to the use of open source software and an open dataset! I felt really good, as I think I successfully got my message across, both in words and in actions.

To top it all off, my presentation came just at the right time. Before me, two other presentations during my session had mentioned the use of the IPython Notebook as open source software tool to enable reproducibility of scientific work. They had highlighted that it shows great potential and that it deserves further investigation. I think my presentation gave them even more proof of that! Even the chairman acknowledged this when he stated: “Before we heard about it, but now we saw it in action!” I felt very proud of what I had done. The effort I put into running the HTML slideshow definitely paid off!!!

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

What is ontology?

December 19th, 2013

The topic of a blog in my mind, after five days at the American Geophysical Union 2013 Fall Meeting discussing Earth and space science informatics, is to give an introduction of ontology to researchers in Earth and environmental sciences and beyond.

To attract your interest, I would say that ontology is the invisible hand behind anything. (It took me a few minutes to think about whether I should add an ‘an’ before the ‘ontology’ here. For reasons see below.)

First let’s see the etymology of the word ‘ontology’. According to Wiktionary (http://en.wiktionary.org/wiki/ontology), ontology is ‘originally Latin ontologia (1606, Ogdoas Scholastica, by Jacob Lorhard (Lorhardus)), from Ancient Greek ὤν (ōn, “on”), present participle of εἰμί (eimi, “being, existing, essence”) + λόγος (logos, “account”).’

Second let’s see the definition of the word. It is also interesting to see that Wiktionary claims that in philosophy the word ‘ontology’ can be either uncountable or countable. For the former, ontology is defined by Wiktionary as ‘The branch of metaphysics that addresses the nature or essential characteristics of being and of things that exist; the study of being.’ This definition is more or less the same as another one done by the Oxford English Dictionary, ‘The science or study of being; that branch of metaphysics concerned with the nature or essence of being or existence.’ That Oxford definition was used in my PhD defense (http://www.slideshare.net/MarshallXMa/ontology-spectrum-for-geological-data-interoperability-phddefence). For the countable ‘ontology’, Wiktionary defines it as ‘The theory of a particular philosopher or school of thought concerning the fundamental types of entity in the universe.’ I had not done any work relevant to that definition yet but I just found Oxford also has a similar definition ‘As a count noun: a theory or conception relating to the nature of being.’

The word metaphysics is mentioned in the definition of ontology as an unaccountable noun. In now days when people talk about metaphysics they often refer to Aristotle (384 – 322 BCE). If you (especially those who are working for a Doctor of PHILOSOPHY ;-)) are interested in his study you can read the two most famous books 1) Politics: A Treatise on Government and 2) The Ethics of Aristotle by him on the Gutenberg website (http://www.gutenberg.org/ebooks/author/2747). The story does not stop here. In a famous Chinese book, I Ching (or the Book of Changes, c. 450 – 250 BCE), there are also topics about metaphysics, such as a sentence which is my personal favorite: ‘What is above form is called Tao; what is within form is called tool.’

The philosophical meaning of the word ontology is the background and for most cases in the domain of Earth and space science informatics we care more about another meaning of the word: ontology as a countable noun in computer science. Before discussing definition of ontology as a computer science word, let’s first see how hot this word is in recent years. I did a few searches with the topic ‘ontology’ in isiknowledge.com (on Dec 19, 2013), which showed that there are about 44884 publications for all years, and publication numbers for separate periods are 1470/1945–1995, 1498/1995–2000, ~7901/2000–2005, ~24528/2005–2010, and ~16891/2010–2013. If I refined the results by limiting to the research area ‘Computer Science’, the results are: ~22251/all years, 114/1945–1995, 673/1995–2000, ~5095/2000–2005, ~14316/2005–2010, and ~5971/2010–2013. And there are a big number of publications that applied informatics and were filtered out by the keyword ‘Computer Science’. From those results we can see many meanings, one is that works with the computer science ‘ontology’ has been increasing significantly since 2000.

For the definition of the computer science word ‘ontology’, many people have cited the publications of T.R. Gruber (1993, 1995, see: http://dx.doi.org/10.1006/knac.1993.1008 and http://dx.doi.org/10.1006/ijhc.1995.1081): ‘An ontology is an explicit specification of a conceptualization’. Middle 1990s is the golden age for discussing the definition of ontology. N. Guarino (1997, see: http://dx.doi.org/10.1006/ijhc.1996.0091) made a nice review of the definition of ‘ontology’, in which I think one key point he discussed was the ‘shared conceptualization’ feature of an ontology. So in my PhD dissertation (Ma, 2011, see: http://www.itc.nl/library/papers_2011/phd/ma.pdf) I tried to re-address the definition of the computer science ‘ontology’: ‘Ontologies in computer science are defined as shared conceptualizations of domain knowledge (Gruber, 1995; Guarino, 1997b)…’

Third, after seeing the definition of ontology, let’s focus on how to put a computer science ‘ontology’ into practice, especially in the domain of Earth and space science informatics. Early 2000s is the golden age for that work. McGuinness (2003, see: http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-%28with-citation%29.htm) made a wonderful discussion of the ontology spectrum. McGuinness also made a footnote to that spectrum figure: ‘This spectrum arose out of a conversation in preparation for an ontology panel at AAAI ’99. The panelists (Gruninger, Lehman, McGuinness, Ushold, and Welty), chosen because of their years of experience in ontologies found that they encountered many forms of specifications that different people termed ontologies. McGuinness refined the picture to the one included here.’ When I was doing my PhD I read this note and I tried to find a few other publications by people in the panelists listed by McGuinness, and I did find a few that also discussed the ontology spectrum, for example:
Welty, C., 2002. Ontology-driven conceptual modeling. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (Eds.), Advanced Information Systems Engineering, Lecture Notes in Computer Science, vol. 2348. Springer-Verlag, Berlin & Heidelberg, Germany, pp. 3-3. Lecture slides available at: http://www.cs.toronto.edu/caise02/cwelty.pdf
Obrst, L., 2003. Ontologies for semantically interoperable systems. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, New Orleans, LA, USA, 366-369.
Uschold, M., Gruninger, M., 2004. Ontologies and semantics for seamless connectivity. SIGMOD Record 33 (4), 58–64.
Borgo, S., Guarino, N., Vieu, L., 2005. Formal ontology for semanticists. In: Lecture notes of the 17th European Summer School in Logic, Language and Information (ESSLLI 2005), Edinburgh, Scotland, 12pp. http://www.loa-cnr.it/Tutorials/ESSLLI1.pdf

OS1
An ontology spectrum (from McGuinness 2003)

To help myself understand the ontology spectrum better, I redrew the diagram (see below) in my PhD dissertation. Very recently (Dec 03, 2013) Jim McGusker, a PhD student with McGuinness, made a thorough explanation of the spectrum in his blog (see: http://info.5amsolutions.com/blog/bid/154967/6-Points-Along-the-Ontology-Spectrum).

OS2
Ontology spectrum (adapted from Borgo et al., 2005; McGuinness, 2003; Obrst, 2003; Uschold and Gruninger, 2004; Welty, 2002). Texts in italics explain a typical relationship in each ontology type (from Ma 2011)

Finally, I would like to share a few examples for different types of ontologies following the spectrum:

Catalog/Glossary:
Neuendorf, K.K.E., Mehl, J.J.P., Jackson, J.A., 2005. Glossary of Geology, 5th edition. American Geological Institute: Alexandria, VA, USA, p. 800. See latest version at: http://www.agiweb.org/pubs/glossary/

Taxonomy:
BGS Rock Classification Scheme, see: https://www.bgs.ac.uk/bgsrcs/

Thesaurus:
AQSIQ, 1988. GB/T 9649-1988 The Terminology Classification Codes of Geology and Mineral Resources. General Administration of Quality Supervision, Inspection and Quarantine of P.R. China (AQSIQ). Standards Press of China, Beijing, China. 1937 pp.

Conceptual Schema:
NADM Steering Committee, 2004. NADM Conceptual Model 1.0—A conceptual model for geologic map information: U.S. Geological Survey Open-File Report 2004-1334, North American Geologic Map Data Model (NADM) Steering Committee, Reston, VA, USA, 58 pp. See: http://pubs.usgs.gov/of/2004/1334

Ontologies encoded in RDF format:
Semantic Web for Earth and Environmental Terminology (SWEET). See: http://sweet.jpl.nasa.gov/

Now a short wrap up about what is ontology:
For fun: the invisible hand behind anything;
In philosophy: (uncountable) the science or study of being; that branch of metaphysics concerned with the nature or essence of being or existence; (countable) a theory or conception relating to the nature of being;
In computer science: shared conceptualization of domain knowledge.

To put ontologies (computer science) into practice, keep in mind an ontology spectrum with enriching meanings: catalog/glossary -> taxonomy -> thesaurus -> conceptual schema -> formal constraints.

VN:F [1.9.22_1171]
Rating: 8.8/10 (4 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

My report on Open Government Data camp 2011

November 2nd, 2011

A few days ago I (Alvaro Graves) participated in the Open Government Data Camp 2011 in Warsaw, Poland, where people from different groups, organizations and governments met to discuss issues related to Open Data at government level. Here are some of the most important issues found in theese talk, in my opinion.

The current state of OGD

David Eaves, an activist who advises the city of Vancouver, Canada in issues about Open Data, gave a keynote in which he described his views on the current state of Open Data movement. First, it is striking that the success stories are not just a few anymore (as Data.gov or Data.gov.uk) but there are dozens (perhaps hundreds), both at national, regional and local levels. Similarly, the term Open Government Data is becoming increasingly popular, which is good because it is easier to stop explaining the ‘what’ and start focusing in the ‘how’.

Another interesting point is how the movement of Open Government Data already passed an inflection point, where it is no longer seen as people demanding from the outside, but being increasingly being invited to help working on these initiatives from within the government. For many, this change in perspective can be confusing and may create some concerns of Open Data being absorbed in a bureaucratic system that makes impossible to implement Open Data initiatives. However, it is clear that in order for these changes to occur, the movement can not reject to collaborate with governments.

Local initiatives, by locals

A talk that I really liked was by Ton Zylstra, who lives in the city of Enschede, the Netherlands. This city has only 150,000 inhabitants. He wanted an Open Data initiative there, however, it was difficult to convince the authorities, so he with a group of people decided to start working on their own. Inviting a handful of hackers to a bar, they created their first application that used data from Twitter, Foursquare, and the venues of a local festival. Eventually they convinced the municipal government that the default option for local data ought to be open.

From this experience, Ton showed several important lessons: You have to create something concrete, no matter if it is small: This implies something that requires little funding (the first beers at the bar were free) and short-term (no more than a couple of weeks). It does not matter if it is something original or not, there are some great ideas out there that deserve to be copied and are very useful for the local community.

How the Open Data died

Another very interesting keynote was by Chris Taggart, founder of OpenCorporates, who warned of the risks that the Open Data movement is facing today. His main concern is the lack of relevance in terms of impact Open Data has on society. For example, he mentioned that so far no one’s business depends on Open Data (although this is not true, there are a few out there, but I have to concede they are rare examples). In general, making data available is not enough, it is necessary for it to be used either in applications, by data journalists, etc. Also, it is fundamental to link different sites with Open Data (something quite uncommon in the movement), so that people can find out more information. Finally, I liked his idea that if the Open Data does not cause problems to its incumbents, then it is not working.

Redefining what is public

Finally another talk that I found interesting was the idea of ​​Dave Rasiej, founder of Personal Democracy, and Nigel Shaldbolt, professor at University of Southampton, to redefine “the public” in terms of data that “is available on the Web in machine-processable formats.” That is, uploading a bunch of PDFs with scanned tables does not make that information public, because it is not easily accessible. This initiative raises the bar of what public data is, especially when compared to the FOIA (Freedom of Information Act) that allows you to request information from government. Note that this applies to all information, as Rasiej so vehemently described it.

So… what did you talked about at OGDCamp?

In my case, I presented a system for publishing Linked Data called LODSPeaKr, which can be used for the rapid publication of government data and to create applications based on Linked Data. In the near future I will be writing more about this framework, but for now you can see my presentation here.

VN:F [1.9.22_1171]
Rating: 9.5/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

AAAI 2011 Fall Symposium on Open Government Knowledge, This weekend (Nov 4-6), Washington DC

November 1st, 2011

————————————————————————————————
Title:  Open Government Knowledge: AI Opportunities and Challenges
When:  4-6 November 2011
Where:  Westin Arlington Gateway in Arlington, Virginia, USA
Homepage: http://tw.rpi.edu/ogk2011
Program (PDF): http://tw.rpi.edu/media/latest/ogk2011.pdf
————————————————————————————————

Please join us to meet the thought governmental and business leaders in
US open government data activities, and discuss the challenges. The
symposium features Friday (Nov 4) as governmental day with speakers on
Data.gov, openEi.org, open gov data activities in NIH/NCI, NASA. and
Saturday (Nov 5) as R&D day with speakers from industry such as Google
and Microsoft, as well international researchers.

This symposium will explore how AI technologies such as the Semantic Web,
information extraction, statistical analysis and machine learning, can be used
to make the valuable knowledge embedded in open government data more
explicit, accessible and reusable.

Co-Chairs
* Li Ding, Qualcomm (Previously RPI)
* Tim Finin, UMBC
* Lalana Kagal, MIT
* Deborah McGuinness, RPI

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Unanticipated consequences: Saving data.gov

April 14th, 2011

I had a bizarre dream last night, one of those surreal shockers. The details aren’t important, but I realized on waking up that the dream’s theme was all about unanticipated consequences.  I realized I needed to write this post.

To set some context: I went to bed upset last night.  I was upset at two things, one is an article on techcrunch entitled “Five Open Questions For Data.gov Before We #SaveTheData,” the other was my response to the article.  I hope I can respond to the first and apologize for the second.  I want to make one thing clear, however, before I start – I am a strong supporter of http://data.gov, I think it is a great experiment in democracy resulting from bold leadership, and if it dies in the current budget cutting it will be an enduring embarrassment for the USA and a major loss to government transparency.

The article I was upset about was written by Kate Ray (@kraykray), an amazingly bright and articulate young woman who has made several very impressive videos and online articles that I am a fan of.  She recently was one of the co-founders of “NerdCollider,” a website designed to bring intelligent discussion to interesting issues — an idea I support.  I was proud to be an early contributor to one of their discussions, which asked “What would you change about Data.gov to get more people to care?

In the TechCrunch blog post I mentioned above, Kate takes several quotes from this discussion and reflects on their import — is data.gov taking some of the key issues into account?  As a good reporter, Kate’s OpEd is actually quite objective – she reports on several comments made by people, including me, as to issues the site has in terms of its effort to share government data.   TechCrunch is a very influential site, the article title has been tweeted and retweeted hundreds of times to hundreds of thousands of potential readers (congrats to Kate on this viral takeup), raising awareness of Congress’ narrow-minded goal of killing the project, which I guess is a good thing.  Unfortunately, the choice of the word “Before” in “… Before we #savethedata” has a negative implication, and I’m hoping that doesn’t kill off the positive efforts that the #savethedata meme was designed to promote.

In her article, Kate brings up important issues, but what she doesn’t make clear is that most of the people she quotes are indeed strong supporters of the Open Government movement and fans of Data.gov.  The seeming criticisms were actually constructive responses to the question of how we could get more people to care (a positive), and not meant to say what was wrong with the site that must be fixed before the site was useful.  It’s already very useful, but like any new effort, there’s always room for improvement. However, those changes will never happen if the site is forced to go dark!

As I said, Kate’s article has been phenomenally well tweeted, in fact, if you look at #savethedata the stream is so filled with pointers to this article that one can no longer easily find the link to the Petition created by the Sunlight Foundation to help stop the budget cuts — that petition is where the #savethedata meme started (thanks @EllnMiller).  Kate also doesn’t point to the great HuffPost article by @bethnoveck explaining why cutting the funding to this and other egovernment sites will threaten American jobs which was also retweeting around the #savethedata meme.

So I hope one unanticipated consequence of this article is that it doesn’t help cause the death of data.gov by killing off the awareness of its importance or losing the momentum on the petition that could save it.

But, as Arlo Guthrie used to say, “that’s not what I came here to talk about tonight…”

In my response to Kate’s article, I referred to her making factual errors.  This is a horrible thing to accuse a young journalist of, and I was being unfair.  The errors I wanted to point out were not in Kate’s piece, but in the chart chosen to go along.  It appears to show a flatline in the interest in data.gov, using figures from (as Kate told me later in a separate tweet) compete.com on “unique visits.”  I don’t know where compete.com gets the data, but the tracking of the  number of visitors on the data.gov site — which are reported on the site on a daily basis seems to show a much larger number with a more positive trend (over 180,000 visits in March).  It’s unclear why there is this discrepancy (I suspect it’s in how compete.com figures uniqueness for sites they don’t control), but it is clear it isn’t Kate’s fault.   She also cites the number of downloads in her article as 1.5M since Oct 2010, which is the number reported on data.gov, but as of last week, the site broke 2M downloads, and the number is trending up.

Anyway, I’m digressing again (occupational hazard of a college professor) — the key point is the errors are not Kate’s and that she was reflecting on what she found.

I also was upset that she quoted me out of context – in my nerdcollider response I made it clear I was supporting data.gov, and offering some constructive solutions to the question of how we could make the site better.  As the quote appears in her piece, it looks like I’m saying the data is poorly organized on the site — but what I was actually saying is that in the incredible richness of  data sets available (data.gov hosted over 300,000 datasets at last count!) we have to explore new ways to search for data  — it’s a wonderful problem to have!  But I did say what she quoted, and as she pointed out to me, correctly, one of the good things about nerdcollider is that the full context of the quotes are there to be cited.  She’s right.

So just as I hope Kate’s piece doesn’t have the unanticipated consequence of hurting data.gov, I hope my admittedly intemperate response doesn’t have the unanticipated consequence of hurting the reputation of this young potential online media star.

@kraykray – I apologize.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)
Author: Categories: linked data, open data, personal ramblings, twitter Tags: