Open Source Software & Science Reproducibility

January 14th, 2014

This year my contribution to the AGU fall meeting 2013 was all about the development of Open Source Software to enable the reproducibility of scientific products, with both a Poster and an Oral presentation. The AGU was the perfect opportunity to share my ideas on a topic that is one of my main interests.

This was my 2nd time at AGU, but my first time with an oral presentation which turned in a real challenge!

The main issue was a combination of 2 factors : I had decided to generate the slideshow in realtime as HTML from an online IPython Notebook. I thought it would be cool to show this functionality, as well as the work itself. Unfortunately, I was dependent on an internet connection at the time of the presentation, but alas, at AGU the presenter computer doesn’t have internet connection! Definitely not the best conditions for a web based slideshow generated “on-the-fly” by the execution of an IPython Notebook.

I found out about the lack of connectivity only 2 days before my presentation. I must have misunderstood the AGU oral presentation guidelines, but when I didn’t find an explicit mention of the lack of an internet connection, I took it for granted that that wouldn’t be an issue. Big mistake!

I decided it would be safer to prepare a power-point presentation, and some time later, I had one. Deep breath; I would be safe. But… what a disappointment !

I was so excited about the idea of showing my work running in realtime instead of showing a static (somewhat boring) ppt  presentation!!!

I kept thinking about alternative solutions, though, and an idea quickly came to me. If the lack of internet stands in the way of an interactive, realtime demo there should be no problem in running a static HTML slideshows instead; at least that is what I thought …

I used the IPython “nbconvert” utility and its “convert to slide” option, and I successfully converted my workflow from an interactive IPython notebook running in slideshow mode to a static HTML5 slideshows, yeah! The audience wouldn’t get to see how this was done, but at least they would get to see the result.

Happy with the final HTML presentation I finally went to the “AGU’s Speaker Ready Room” to upload and test my presentation. Unfortunately, my HTML presentation would not run offline. The lack of internet was giving me troubles with missing JavaScript files, missing fonts, images-urls to be replaced with path to static files, broken hyperlinks etc … it was not as easy as I thought.

It took more than 3 hours to fix all the bugs on account of a really slow internet connection running from my phone, but finally i got my presentation perfectly  running off line on the AGU computers !

In the end, my talk ran very smoothly. A complete workflow for “catchments characterization” using exclusively open source software, running online and fully reproducible thanks to the use of open source software and an open dataset! I felt really good, as I think I successfully got my message across, both in words and in actions.

To top it all off, my presentation came just at the right time. Before me, two other presentations during my session had mentioned the use of the IPython Notebook as open source software tool to enable reproducibility of scientific work. They had highlighted that it shows great potential and that it deserves further investigation. I think my presentation gave them even more proof of that! Even the chairman acknowledged this when he stated: “Before we heard about it, but now we saw it in action!” I felt very proud of what I had done. The effort I put into running the HTML slideshow definitely paid off!!!


AGU fall meeting notes – part 1 – two levels of conversations

January 5th, 2014

I mentioned that I’ve talked with two kinds of people – researchers and managers – during ODIP workshop #2 and AGU fall meeting 2013 in my last post, and here comes the definitions I left out.

By researchers, I mean people who are workers in the community — those who deal with technical details, think of novel approaches to problems, and fulfill ideas proposed by the managers.

Managers, on the other hand, are those who deal with people and make proposals to find social and financial support for workers. They may also do some marketing job to find customers for the products created.

Conversations with researchers (I probably didn’t pick the best word because people called researchers usually do both research and management) are usually very technical — approaches, architectures, frameworks, software, parameters, scalability, maintainability, etc. I currently is very biased towards the technical side of research and feel very comfortable in such kind of conversations.

Quite a lot of times, however, I find myself totally lost in conversations with managers – when they talk about funding, grants, recruiting, outreach, and names of people and institutions I’ve never heard of. I realize I didn’t pay enough attention to the managing side of research, which is equally important, if not more than, the technical side. Managers’ work of building and maintaining the supply chains and sales channels for research products is indispensable for research institutes and the academic community as a whole to function properly.

As a PhD student I didn’t get many opportunities to get involved in managing jobs, but in order to let people use the products that we are developing and we believe useful, I think I need to learn beyond the worker’s part and get to understand how the whole academic business works, and talking with managers at the ODIP workshop and AGU fall meeting certainly is a good starting point!


