Archive

Archive for the ‘Blog’ Category

Open Source Software & Science Reproducibility

January 14th, 2014

This year my contribution to the AGU fall meeting 2013 was all about the development of Open Source Software to enable the reproducibility of scientific products, with both a Poster and an Oral presentation. The AGU was the perfect opportunity to share my ideas on a topic that is one of my main interests.

This was my 2nd time at AGU, but my first time with an oral presentation which turned in a real challenge!

The main issue was a combination of 2 factors : I had decided to generate the slideshow in realtime as HTML from an online IPython Notebook. I thought it would be cool to show this functionality, as well as the work itself. Unfortunately, I was dependent on an internet connection at the time of the presentation, but alas, at AGU the presenter computer doesn’t have internet connection! Definitely not the best conditions for a web based slideshow generated “on-the-fly” by the execution of an IPython Notebook.

I found out about the lack of connectivity only 2 days before my presentation. I must have misunderstood the AGU oral presentation guidelines, but when I didn’t find an explicit mention of the lack of an internet connection, I took it for granted that that wouldn’t be an issue. Big mistake!

I decided it would be safer to prepare a power-point presentation, and some time later, I had one. Deep breath; I would be safe. But… what a disappointment !

I was so excited about the idea of showing my work running in realtime instead of showing a static (somewhat boring) ppt  presentation!!!

I kept thinking about alternative solutions, though, and an idea quickly came to me. If the lack of internet stands in the way of an interactive, realtime demo there should be no problem in running a static HTML slideshows instead; at least that is what I thought …

I used the IPython “nbconvert” utility and its “convert to slide” option, and I successfully converted my workflow from an interactive IPython notebook running in slideshow mode to a static HTML5 slideshows, yeah! The audience wouldn’t get to see how this was done, but at least they would get to see the result.

Happy with the final HTML presentation I finally went to the “AGU’s Speaker Ready Room” to upload and test my presentation. Unfortunately, my HTML presentation would not run offline. The lack of internet was giving me troubles with missing JavaScript files, missing fonts, images-urls to be replaced with path to static files, broken hyperlinks etc … it was not as easy as I thought.

It took more than 3 hours to fix all the bugs on account of a really slow internet connection running from my phone, but finally i got my presentation perfectly  running off line on the AGU computers !

In the end, my talk ran very smoothly. A complete workflow for “catchments characterization” using exclusively open source software, running online and fully reproducible thanks to the use of open source software and an open dataset! I felt really good, as I think I successfully got my message across, both in words and in actions.

To top it all off, my presentation came just at the right time. Before me, two other presentations during my session had mentioned the use of the IPython Notebook as open source software tool to enable reproducibility of scientific work. They had highlighted that it shows great potential and that it deserves further investigation. I think my presentation gave them even more proof of that! Even the chairman acknowledged this when he stated: “Before we heard about it, but now we saw it in action!” I felt very proud of what I had done. The effort I put into running the HTML slideshow definitely paid off!!!

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

ESIP Winter Meeting 2013

January 24th, 2013

I presented a poster about how can we use Semantic Web Technology to help building an information model and information system for National Climate Assessment (NCA). NCA is a report which integrates, evaluates, and interprets the findings of climate change and impacts on affected industries such as agriculture, natural environment, energy production and use, etc. One of the problem of building an information system for NCA is that NCA uses information from wide range of information sources and covered many climate related topic, this makes it difficult for users to find and identify information they needed. Using Semantic Web Technology, we created a well-structured ontology, where relationship between NCA-realted entities, concepts are well-defined. We also use other Semantic Technologies such as Prov-O, SPARQL-endpoints, and ontology-based facet browsers can to help solve the problem.

Overall, the presentation went well. Few people found particular interest on how we leverages GCMD keywords and Clean Vocabulary when building the information system for NCA, and what benefits can they bring to NCA report information system. Other people also found Facet Search System interesting, especially on applying it on the Geo-related data.

I also attended Semantic Web-related sessions. One of them was discussing how can we use Semantic Web Technology to help solving “Tool Match” problem. Using OWL ontology to encode the rules and concepts about tools and datasets, and then use description logic reasoners to perform Tool Match. Another one was giving tutorial on Semantic Web technology to the ESIP community. It is nice to see Semantic Web Technology really helps different communities to solve various problems and people are becoming more and more interested on this technology.

VN:F [1.9.22_1171]
Rating: 8.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: Blog, Data Science, Semantic Web, tetherless world Tags:

Fall 2010 TWC Undergraduate Research Summary

December 20th, 2010

The Fall 2010 semester marked the beginning of the Tetherless World Constellation’s undergraduate research program at Rensselaer Polytechnic Institute (RPI). Although TWC has enjoyed significant contributions from RPI undergrads since its inception, this term we stepped up our game by more “formally” incorporating a group of undergrads into TWC’s research programs, established regular meetings for the group, and with input from the students began outfitting their own space in RPI’s Winslow Building.

Patrick West, my fellow TWC undergrad research coordinator and I asked the students to blog about their work throughout the semester; with the end of term, we asked them to post summary descriptions of their work and their thoughts about the fledgling TWC undergrad research program itself. We’ve provided short summaries and links to those blogs below…

  • Cameron Helm began the term coming up to speed on SPARQL and RDF, experimented with several of the public TWC endpoints, and then worked with Phillip on basic visualizations. He then slashed his way through the tutorials on TWC’s LOGD Portal, eventually creating impressive visualizations such as this earthquake map. Cameron is very interested in the subject of data visualization and looks to do more work in this area in the future.
  • After a short TWC learning period, Dan Souza began helping doctoral candidate Evan Patton create an Android version of the Mobile Wine Agent application, with all the amazing visualization and data integration required, including Twitter and Facebook integration. Mid-semester Dan also responded to the call to help with the crash” development of the Android/iPhone TalkTracker app, in time for ISWC 2010 in early November. Dan continues to work with Evan and others for early 2011 releases of Android, iPhone/iPad Touch and iPad versions of the Mobile Wine Agent.
  • David Molik reports that he learned web coding skills, ontology creation, server installation and administration. David contributed to the development and operation of a test site for the new, semantic web savvy website for the Biological and Chemical Oceanography Data Management Office BCO-DMO of the Woods Hole Oceanographic Institute.
  • Jay Chamberlin spent much of his time working on the OPeNDAP Project, an open source server to distribute scientific data that is stored in various formats. His involvement included everything from learning his way around the OPeNAP server, to working with infrastructure such as TWC’s LDAP services, to helping migrate documentation from the previous Wiki to the new Drupal site, to actually implementing required changes to the OPeNDAP code base.
  • Phillip Ng worked on a wide variety of projects this fall, starting with basic visualizations, helping with ISWC applications, and including iPad development for the Mobile Wine Agent. Phillip’s blog is fascinating to read as he works his way through the challenges of creating applications, including his multi-part series on implementing the social media features.
  • Alexei Bulazel began working with Dominic DiFranzo on a health-related mashup using Data.gov datasets and is now working on a research paper with David on “human flesh search engine” techniques, a topic that top thinkers including Tetherless World Senior Constellation Professor Jim Hendler have explored in recent talks. Note: For more background on this phenomena, see e.g. China’s Cyberposse, NY Times (03 Mar 2010)

Many of these students will be continuing on with these or other projects at TWC in 2011; we also expect several new students to be joining the group. The entire team at the Tetherless World Constellation thanks them for their efforts and many important contributions this fall, and looks forward to being amazed by their continued great work in the coming year!

John S. Erickson, Ph.D.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Timeline of ISWC 2010 Main Conference Talks

November 1st, 2010

This is another visualization using Datapress
It shows talks at the main conference of ISWC 2010.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: Blog, iswc, tetherless world, visualization Tags:

Visualizing Data using Datapress

November 1st, 2010

I attended a seminar at MIT last Friday (for this year, I’m a part-time member of the DIG group there). Edward Benson gave an impressive demo on Datapress, an extension of WordPress that can enable non-geeks to import and visualize data in their blogs.

Since our TW blog is based on WordPress, I installed the extension and began to try. The installation was surprisingly smooth, just a few clicks and it’s done in 30 seconds!

The first thing I want to try is to visualize the ISWC 2010 dataset I recently built. Since Datapress does not yet support importing from RDF, I created a spreadsheet using a SPARQL query in TopBraid Composer:

SELECT distinct ?l ?lat ?long
WHERE {
?s swc:isSubEventOf iswc2010:research-track .
?s swc:isSuperEventOf ?p .
?p swc:hasRelatedDocument ?d .
?d foaf:maker ?m.
?m swrc:affiliation ?o.
?o rdfs:label ?l .
?o foaf:based_near ?b .
?b geo:lat ?lat.
?b geo:long ?long .
}

There are some minor format requirements (I didn’t get it right in the first try -Ted helped me to identify the problem)
* the first line of the spreadsheet should be headers, and the “key” line should have “{{label}}”
* To show on a map, coordinates should be shown as Lat,Lng. Hence, I need to combine the last two columns into one, separated with a comma.

The next step is to upload it to Google Docs, and share it as a public document (can be viewed here)

Then, I can go back to the blog post that I’m writing, click a button on Datapress toolbar in the editing interface, add the data by giving it the URL to the Google Docs spreadsheet, and select Map visualization. The process is very user friendly.

You can add multiple visualizations to one post. This is a very handy way to generate visualization using Exhibit. Actually, I have thought about visualizing ISWC data using Exhibit, but didn’t get time (or too lazy) to program. Datapress saved me.

Ted will give the presentation about Datapress at ISWC next week. Don’t miss it if you will also be at Shanghai!

(The map shows locations of research track authors at ISWC 2010.)

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: Blog, iswc, Semantic Web, visualization Tags: