Home > Blog, cloud computing, Data Science, open data, tetherless world, visualization, Web Science > Open Source Software & Science Reproducibility

Open Source Software & Science Reproducibility

January 14th, 2014

This year my contribution to the AGU fall meeting 2013 was all about the development of Open Source Software to enable the reproducibility of scientific products, with both a Poster and an Oral presentation. The AGU was the perfect opportunity to share my ideas on a topic that is one of my main interests.

This was my 2nd time at AGU, but my first time with an oral presentation which turned in a real challenge!

The main issue was a combination of 2 factors : I had decided to generate the slideshow in realtime as HTML from an online IPython Notebook. I thought it would be cool to show this functionality, as well as the work itself. Unfortunately, I was dependent on an internet connection at the time of the presentation, but alas, at AGU the presenter computer doesn’t have internet connection! Definitely not the best conditions for a web based slideshow generated “on-the-fly” by the execution of an IPython Notebook.

I found out about the lack of connectivity only 2 days before my presentation. I must have misunderstood the AGU oral presentation guidelines, but when I didn’t find an explicit mention of the lack of an internet connection, I took it for granted that that wouldn’t be an issue. Big mistake!

I decided it would be safer to prepare a power-point presentation, and some time later, I had one. Deep breath; I would be safe. But… what a disappointment !

I was so excited about the idea of showing my work running in realtime instead of showing a static (somewhat boring) ppt  presentation!!!

I kept thinking about alternative solutions, though, and an idea quickly came to me. If the lack of internet stands in the way of an interactive, realtime demo there should be no problem in running a static HTML slideshows instead; at least that is what I thought …

I used the IPython “nbconvert” utility and its “convert to slide” option, and I successfully converted my workflow from an interactive IPython notebook running in slideshow mode to a static HTML5 slideshows, yeah! The audience wouldn’t get to see how this was done, but at least they would get to see the result.

Happy with the final HTML presentation I finally went to the “AGU’s Speaker Ready Room” to upload and test my presentation. Unfortunately, my HTML presentation would not run offline. The lack of internet was giving me troubles with missing JavaScript files, missing fonts, images-urls to be replaced with path to static files, broken hyperlinks etc … it was not as easy as I thought.

It took more than 3 hours to fix all the bugs on account of a really slow internet connection running from my phone, but finally i got my presentation perfectly  running off line on the AGU computers !

In the end, my talk ran very smoothly. A complete workflow for “catchments characterization” using exclusively open source software, running online and fully reproducible thanks to the use of open source software and an open dataset! I felt really good, as I think I successfully got my message across, both in words and in actions.

To top it all off, my presentation came just at the right time. Before me, two other presentations during my session had mentioned the use of the IPython Notebook as open source software tool to enable reproducibility of scientific work. They had highlighted that it shows great potential and that it deserves further investigation. I think my presentation gave them even more proof of that! Even the chairman acknowledged this when he stated: “Before we heard about it, but now we saw it in action!” I felt very proud of what I had done. The effort I put into running the HTML slideshow definitely paid off!!!

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
  1. January 14th, 2014 at 18:08 | #1

    On the one hand I can understand their requirements for having static presentations. There are a lot of sessions and a lot of presentations, all on a tight schedule. So making it so that all presentations are static and ready to go without having to worry about connectivity, having the right client software, etc… Is ideal.

    At the same time, in an interactive,real-time environment such as earth science informatics and geo sciences one would hope that we be able to be more open and flexible, utilizing open source solutions for presentations such as open office or neo office, online presentations based on standard web technologies (HTML5, JavaScript, style sheets), and the push to open data and transparency.

    Another interesting issue to consider is that of provenance of presentations. If I give you a URL for a presentation for a given event, how can I be sure that his is actually the presentation that you gave at the event without modification, etc… I’ve seen it where someone gives a presentation and then makes some pretty major modifications to it, adding additional slides, additional information, and covering additional topics. It’s one thing to add a reference that you may have forgotten, or correcting an acknowledgement, fixing a link, those sorts of things.

    The original link on your presentation page referred to a particular version of an ipython notebook that includes an expressive representation of the notebook, including the date and time uploaded. So you could query our triple store for information on that manifestation of the document. Unfortunately, the slide view link no longer works, so you had to switch to a different manifestation of your presentation.

    Anyway, thanks for posting this blog about your AGU experience.

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  2. Massimo Di Stefano
    January 15th, 2014 at 01:23 | #2

    @Patrick West
    As i stated in the post, the assumption i made about a working internet connection was “a big mistake” …
    About the ‘provenance of presentations issue’, both the old [1] and the new [2] link are working in the slideviewer.
    Note that [2] and [3] are the same file, with the main difference that [3] is a TWC resource document.
    This should address the “lack of provenance” highlighted in your reply. (Thanks for your comment)
    Also [1] is a version we made before the presentation day, not the final version, hence i had to update it.
    The final version presented at AGU and linked in the blog post is the HTML generated by
    ” ipython nbconvert AGU-2013-H52E02-MDS.ipynb –to slides “
    this command is mentioned in the slide itself, and not a link to the slide viewer + url.json because of the lack of internet.

    [1] http://tw.rpi.edu/media/latest/AGU-fall-2013-V5.ipynb

    [2] http://tw.rpi.edu/media/2014/01/14/f909/AGU-2013-H52E02-MDS.ipynb

    [3] http://orion.tw.rpi.edu/~epifanio/AGU-2013/AGU-2013-H52E02-MDS.ipynb

    Thanks.

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  3. January 16th, 2014 at 09:10 | #3

    The question of making the provenance of presentations, esp. “dynamic” presentations taking advantage of modern technologies shown “live,” is interesting. Our semantic web toolkit gives us the capability of providing evidence to later “viewers” not only the version of the presentation that was shown and how it has been modified (and by whom) since the presentation date, but also a log of the presentation as given. Further, using online community ontologies such as sioc, one can easily imagine decorating the presentation log with live commentary, including comments from monitored TWitter hashtags.

    The point is that our modern web toolkit should not make the demonstration of provenance more difficult, but radically easier and much richer in content.

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  4. Matthias Bussonnier
    January 24th, 2014 at 04:36 | #4

    Note that when presenting with your computer (and without internet), you can configure IPython to use local resources (MathJax and/or reveal JS) if you have installed them locally. There should be utility function in IPython to help in installing.

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
    • Massimo Di Stefano
      January 24th, 2014 at 05:13 | #5

      Hi Matthias thanks to point this out, i’ll have a look at the configuration options.
      On my laptop everything works fine also off-line, i used IPython functionality to install mathjax and use it off-line.
      Maybe we need the same for reveal.js once the slide-mode will land in IPython Master.
      This was not the case for the AGU where the computer used for the presentation was a remote pc on which i had no direct access.

      VN:F [1.9.22_1171]
      Rating: 0.0/5 (0 votes cast)
      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
  1. No trackbacks yet.