Archive

Archive for the ‘Web Science’ Category

Open Source Software & Science Reproducibility

January 14th, 2014

This year my contribution to the AGU fall meeting 2013 was all about the development of Open Source Software to enable the reproducibility of scientific products, with both a Poster and an Oral presentation. The AGU was the perfect opportunity to share my ideas on a topic that is one of my main interests.

This was my 2nd time at AGU, but my first time with an oral presentation which turned in a real challenge!

The main issue was a combination of 2 factors : I had decided to generate the slideshow in realtime as HTML from an online IPython Notebook. I thought it would be cool to show this functionality, as well as the work itself. Unfortunately, I was dependent on an internet connection at the time of the presentation, but alas, at AGU the presenter computer doesn’t have internet connection! Definitely not the best conditions for a web based slideshow generated “on-the-fly” by the execution of an IPython Notebook.

I found out about the lack of connectivity only 2 days before my presentation. I must have misunderstood the AGU oral presentation guidelines, but when I didn’t find an explicit mention of the lack of an internet connection, I took it for granted that that wouldn’t be an issue. Big mistake!

I decided it would be safer to prepare a power-point presentation, and some time later, I had one. Deep breath; I would be safe. But… what a disappointment !

I was so excited about the idea of showing my work running in realtime instead of showing a static (somewhat boring) ppt  presentation!!!

I kept thinking about alternative solutions, though, and an idea quickly came to me. If the lack of internet stands in the way of an interactive, realtime demo there should be no problem in running a static HTML slideshows instead; at least that is what I thought …

I used the IPython “nbconvert” utility and its “convert to slide” option, and I successfully converted my workflow from an interactive IPython notebook running in slideshow mode to a static HTML5 slideshows, yeah! The audience wouldn’t get to see how this was done, but at least they would get to see the result.

Happy with the final HTML presentation I finally went to the “AGU’s Speaker Ready Room” to upload and test my presentation. Unfortunately, my HTML presentation would not run offline. The lack of internet was giving me troubles with missing JavaScript files, missing fonts, images-urls to be replaced with path to static files, broken hyperlinks etc … it was not as easy as I thought.

It took more than 3 hours to fix all the bugs on account of a really slow internet connection running from my phone, but finally i got my presentation perfectly  running off line on the AGU computers !

In the end, my talk ran very smoothly. A complete workflow for “catchments characterization” using exclusively open source software, running online and fully reproducible thanks to the use of open source software and an open dataset! I felt really good, as I think I successfully got my message across, both in words and in actions.

To top it all off, my presentation came just at the right time. Before me, two other presentations during my session had mentioned the use of the IPython Notebook as open source software tool to enable reproducibility of scientific work. They had highlighted that it shows great potential and that it deserves further investigation. I think my presentation gave them even more proof of that! Even the chairman acknowledged this when he stated: “Before we heard about it, but now we saw it in action!” I felt very proud of what I had done. The effort I put into running the HTML slideshow definitely paid off!!!

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

What is ontology?

December 19th, 2013

The topic of a blog in my mind, after five days at the American Geophysical Union 2013 Fall Meeting discussing Earth and space science informatics, is to give an introduction of ontology to researchers in Earth and environmental sciences and beyond.

To attract your interest, I would say that ontology is the invisible hand behind anything. (It took me a few minutes to think about whether I should add an ‘an’ before the ‘ontology’ here. For reasons see below.)

First let’s see the etymology of the word ‘ontology’. According to Wiktionary (http://en.wiktionary.org/wiki/ontology), ontology is ‘originally Latin ontologia (1606, Ogdoas Scholastica, by Jacob Lorhard (Lorhardus)), from Ancient Greek ὤν (ōn, “on”), present participle of εἰμί (eimi, “being, existing, essence”) + λόγος (logos, “account”).’

Second let’s see the definition of the word. It is also interesting to see that Wiktionary claims that in philosophy the word ‘ontology’ can be either uncountable or countable. For the former, ontology is defined by Wiktionary as ‘The branch of metaphysics that addresses the nature or essential characteristics of being and of things that exist; the study of being.’ This definition is more or less the same as another one done by the Oxford English Dictionary, ‘The science or study of being; that branch of metaphysics concerned with the nature or essence of being or existence.’ That Oxford definition was used in my PhD defense (http://www.slideshare.net/MarshallXMa/ontology-spectrum-for-geological-data-interoperability-phddefence). For the countable ‘ontology’, Wiktionary defines it as ‘The theory of a particular philosopher or school of thought concerning the fundamental types of entity in the universe.’ I had not done any work relevant to that definition yet but I just found Oxford also has a similar definition ‘As a count noun: a theory or conception relating to the nature of being.’

The word metaphysics is mentioned in the definition of ontology as an unaccountable noun. In now days when people talk about metaphysics they often refer to Aristotle (384 – 322 BCE). If you (especially those who are working for a Doctor of PHILOSOPHY ;-)) are interested in his study you can read the two most famous books 1) Politics: A Treatise on Government and 2) The Ethics of Aristotle by him on the Gutenberg website (http://www.gutenberg.org/ebooks/author/2747). The story does not stop here. In a famous Chinese book, I Ching (or the Book of Changes, c. 450 – 250 BCE), there are also topics about metaphysics, such as a sentence which is my personal favorite: ‘What is above form is called Tao; what is within form is called tool.’

The philosophical meaning of the word ontology is the background and for most cases in the domain of Earth and space science informatics we care more about another meaning of the word: ontology as a countable noun in computer science. Before discussing definition of ontology as a computer science word, let’s first see how hot this word is in recent years. I did a few searches with the topic ‘ontology’ in isiknowledge.com (on Dec 19, 2013), which showed that there are about 44884 publications for all years, and publication numbers for separate periods are 1470/1945–1995, 1498/1995–2000, ~7901/2000–2005, ~24528/2005–2010, and ~16891/2010–2013. If I refined the results by limiting to the research area ‘Computer Science’, the results are: ~22251/all years, 114/1945–1995, 673/1995–2000, ~5095/2000–2005, ~14316/2005–2010, and ~5971/2010–2013. And there are a big number of publications that applied informatics and were filtered out by the keyword ‘Computer Science’. From those results we can see many meanings, one is that works with the computer science ‘ontology’ has been increasing significantly since 2000.

For the definition of the computer science word ‘ontology’, many people have cited the publications of T.R. Gruber (1993, 1995, see: http://dx.doi.org/10.1006/knac.1993.1008 and http://dx.doi.org/10.1006/ijhc.1995.1081): ‘An ontology is an explicit specification of a conceptualization’. Middle 1990s is the golden age for discussing the definition of ontology. N. Guarino (1997, see: http://dx.doi.org/10.1006/ijhc.1996.0091) made a nice review of the definition of ‘ontology’, in which I think one key point he discussed was the ‘shared conceptualization’ feature of an ontology. So in my PhD dissertation (Ma, 2011, see: http://www.itc.nl/library/papers_2011/phd/ma.pdf) I tried to re-address the definition of the computer science ‘ontology’: ‘Ontologies in computer science are defined as shared conceptualizations of domain knowledge (Gruber, 1995; Guarino, 1997b)…’

Third, after seeing the definition of ontology, let’s focus on how to put a computer science ‘ontology’ into practice, especially in the domain of Earth and space science informatics. Early 2000s is the golden age for that work. McGuinness (2003, see: http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-%28with-citation%29.htm) made a wonderful discussion of the ontology spectrum. McGuinness also made a footnote to that spectrum figure: ‘This spectrum arose out of a conversation in preparation for an ontology panel at AAAI ’99. The panelists (Gruninger, Lehman, McGuinness, Ushold, and Welty), chosen because of their years of experience in ontologies found that they encountered many forms of specifications that different people termed ontologies. McGuinness refined the picture to the one included here.’ When I was doing my PhD I read this note and I tried to find a few other publications by people in the panelists listed by McGuinness, and I did find a few that also discussed the ontology spectrum, for example:
Welty, C., 2002. Ontology-driven conceptual modeling. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (Eds.), Advanced Information Systems Engineering, Lecture Notes in Computer Science, vol. 2348. Springer-Verlag, Berlin & Heidelberg, Germany, pp. 3-3. Lecture slides available at: http://www.cs.toronto.edu/caise02/cwelty.pdf
Obrst, L., 2003. Ontologies for semantically interoperable systems. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, New Orleans, LA, USA, 366-369.
Uschold, M., Gruninger, M., 2004. Ontologies and semantics for seamless connectivity. SIGMOD Record 33 (4), 58–64.
Borgo, S., Guarino, N., Vieu, L., 2005. Formal ontology for semanticists. In: Lecture notes of the 17th European Summer School in Logic, Language and Information (ESSLLI 2005), Edinburgh, Scotland, 12pp. http://www.loa-cnr.it/Tutorials/ESSLLI1.pdf

OS1
An ontology spectrum (from McGuinness 2003)

To help myself understand the ontology spectrum better, I redrew the diagram (see below) in my PhD dissertation. Very recently (Dec 03, 2013) Jim McGusker, a PhD student with McGuinness, made a thorough explanation of the spectrum in his blog (see: http://info.5amsolutions.com/blog/bid/154967/6-Points-Along-the-Ontology-Spectrum).

OS2
Ontology spectrum (adapted from Borgo et al., 2005; McGuinness, 2003; Obrst, 2003; Uschold and Gruninger, 2004; Welty, 2002). Texts in italics explain a typical relationship in each ontology type (from Ma 2011)

Finally, I would like to share a few examples for different types of ontologies following the spectrum:

Catalog/Glossary:
Neuendorf, K.K.E., Mehl, J.J.P., Jackson, J.A., 2005. Glossary of Geology, 5th edition. American Geological Institute: Alexandria, VA, USA, p. 800. See latest version at: http://www.agiweb.org/pubs/glossary/

Taxonomy:
BGS Rock Classification Scheme, see: https://www.bgs.ac.uk/bgsrcs/

Thesaurus:
AQSIQ, 1988. GB/T 9649-1988 The Terminology Classification Codes of Geology and Mineral Resources. General Administration of Quality Supervision, Inspection and Quarantine of P.R. China (AQSIQ). Standards Press of China, Beijing, China. 1937 pp.

Conceptual Schema:
NADM Steering Committee, 2004. NADM Conceptual Model 1.0—A conceptual model for geologic map information: U.S. Geological Survey Open-File Report 2004-1334, North American Geologic Map Data Model (NADM) Steering Committee, Reston, VA, USA, 58 pp. See: http://pubs.usgs.gov/of/2004/1334

Ontologies encoded in RDF format:
Semantic Web for Earth and Environmental Terminology (SWEET). See: http://sweet.jpl.nasa.gov/

Now a short wrap up about what is ontology:
For fun: the invisible hand behind anything;
In philosophy: (uncountable) the science or study of being; that branch of metaphysics concerned with the nature or essence of being or existence; (countable) a theory or conception relating to the nature of being;
In computer science: shared conceptualization of domain knowledge.

To put ontologies (computer science) into practice, keep in mind an ontology spectrum with enriching meanings: catalog/glossary -> taxonomy -> thesaurus -> conceptual schema -> formal constraints.

VN:F [1.9.22_1171]
Rating: 8.8/10 (4 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

FAKE open access publications in now days and my suggestion

September 24th, 2013

Open Access in now days is such a *FAKE* idea. It is the author’s paper, not the publisher’s. Currently what a reader pays is for the typesetting according to the format of a publisher. A author can make his own manuscript (not the pdf from the publisher) anywhere online for access. Now a author pays hundreds to a publisher for Open Access to his paper. I URGE, publishers should provide a *FREE* function that allows a author registers a link to his author-made version of a paper on the landing page of the DOI of a published paper. This is the *TRUE* Open Access. What most readers need is the meaning of a paper, not the typesetting. If one do cares the typesetting, he can pay a subscription to get the publisher’s version. University or institutional libraries should build facilities and functionalities that support employees to register and upload author-made versions of publications – to improve the visibility and accessibility of the academic work of the institution itself.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Three reasons to attend ESIP winter meeting

January 15th, 2013

Erin Robinson has posted 10 reasons to attend the ESIP winter meeting. I want to provide some feed-backs from the point of view of myself.

Interaction: ESIP meetings are different from a normal conference with sessions of presentation and Q/A. Its sessions are more like break-outs and workshops and require interactions from the audience.

Topics: ESIP meetings cover various topics standing in the fore-front of geo-informatics, cyberinfrastructure and semantic web. It is easy to find a session or poster that could be of interest to you.

Location: Normally the ESIP winter meeting will be held at Washington DC. It’s a city full of museums, good food and other interesting stuff. Take a short visit during the meeting time!

I want to share a image which combines a part of my ESIP 2013 winter meeting poster and a photo taken at the National Museum of Natural History (they share a common topic of geologic time scale).

Capture

An additional information is for students in the field of semantics and/or geoinformatics. You may apply the Rob Raskin scholarship.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Survival and thriving at AGU Fall Meeting 2012

December 11th, 2012

AGU Fall Meeting, if I am correct, is the biggest conference in the field of geosciences. For the year 2012 there were over 22000 people participated in the event. Yet, a conference is more than the number of attendees. AGU is not a single combination of a number of academic meeting sessions. There are various workshops, seminars, town halls, exhibitions and social activities together with it.
I once read an article written by the president of UNISCO (two years ago?), in which it is mentioned that the number of earth scientists across the world is about 440000. This is a tiny number comparing with the global population. While approaching San Francisco and Moscone Center, the city and venue of AGU, I could feel the number of earth scientists around me is increasing sharply. Especially along the 4th street to Moscone Center, what one can see during the AGU week should be called a deluge of earth scientists. Personally, I had an interesting feeling – am I driven by the deluge, or I am a part of it?
Back to the conference itself, it is a big conference so I (1) focused on sessions in the division of Earth and Space Science Informatics (ESSI), and prepared my personal schedule for posters and presentations I was interested. I also set (2) in-person meetings with people with whom we want to discuss some issues related to the research projects DCO-DS and GCIS-IMSAP at TW. There were (3) a number of other workshops and activities together with AGU, such as the workshop of Data Management 101 for Early Career Scientists, the workshop about NSF system, the ESSI reception, the Ignite Talk, etc. Some of them cannot be easily found on the AGU web site, but are informed through different channels. Many thanks to people in those email lists (e.g., AGU-ESSI, ESIP-SW) I joined for sending me the messages.
I gave two presentations on Friday: a poster for the modeling works in the GCIS-IMSAP project (Jin is first author), and an oral presentation on the exploratory visualization of earth science data with semantic web technologies. For the first one, David Arctur suggested that we may bring some geospatial components into the model framework. Stephan discussed that if we use GCMD keywords for GCIS, then in the GCMD keywords there is a part of it is for geospatial descriptions. While I was introducing the searching function in our plan for the GCIS-IMSAP project, Deana Pennington suggested we may also consider the user tag functions, that is, a reader can create tags in the NCA report for further use, while this may also be supported by some Semantic Web technologies. I also discussed the GCMD keywords with Tyler Stevens, a researcher in the GCMD keywords, on how to make GCMD keywords more open for use. He likes our feedback and already provided some information.
My oral presentation was based on some work originated from my PhD study. This work used datasets on the server of the British Geological Survey. I got some updates from Timothy McCormick, the Information Sector Manager (Geology) at BGS, on their Linked Data works of lithology. He suggested me to do some further work using their datasets and services. Luis Bermudez and David Arctur from Open Geospatial Consortium (OGC) suggested me to do more work on semantic web and Web Feature Service (WFS) and Sensor Web, and they suggested TW to obtain a membership at OGC to get fresh first-hand progress of OGC works.
AGU is a big event, a schedule is necessary, as those described above. And, there are also many other interesting side-events. Almost every day I crossed by some old friends, for some of them I had lost contact for more than seven years! The exhibit is great and I collected a bag of earth and space science cards, posters and toys for my son – is he going to be an earth scientist?
There is more to say about a seven-day conference with over 22000 participants. I have to stop here. For those issues related to specific research topics and projects at TW we will have further discussion in the separate groups soon.

VN:F [1.9.22_1171]
Rating: 6.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)