Archive for the ‘cloud computing’ Category

Open Source Software & Science Reproducibility

January 14th, 2014

This year my contribution to the AGU fall meeting 2013 was all about the development of Open Source Software to enable the reproducibility of scientific products, with both a Poster and an Oral presentation. The AGU was the perfect opportunity to share my ideas on a topic that is one of my main interests.

This was my 2nd time at AGU, but my first time with an oral presentation which turned in a real challenge!

The main issue was a combination of 2 factors : I had decided to generate the slideshow in realtime as HTML from an online IPython Notebook. I thought it would be cool to show this functionality, as well as the work itself. Unfortunately, I was dependent on an internet connection at the time of the presentation, but alas, at AGU the presenter computer doesn’t have internet connection! Definitely not the best conditions for a web based slideshow generated “on-the-fly” by the execution of an IPython Notebook.

I found out about the lack of connectivity only 2 days before my presentation. I must have misunderstood the AGU oral presentation guidelines, but when I didn’t find an explicit mention of the lack of an internet connection, I took it for granted that that wouldn’t be an issue. Big mistake!

I decided it would be safer to prepare a power-point presentation, and some time later, I had one. Deep breath; I would be safe. But… what a disappointment !

I was so excited about the idea of showing my work running in realtime instead of showing a static (somewhat boring) ppt  presentation!!!

I kept thinking about alternative solutions, though, and an idea quickly came to me. If the lack of internet stands in the way of an interactive, realtime demo there should be no problem in running a static HTML slideshows instead; at least that is what I thought …

I used the IPython “nbconvert” utility and its “convert to slide” option, and I successfully converted my workflow from an interactive IPython notebook running in slideshow mode to a static HTML5 slideshows, yeah! The audience wouldn’t get to see how this was done, but at least they would get to see the result.

Happy with the final HTML presentation I finally went to the “AGU’s Speaker Ready Room” to upload and test my presentation. Unfortunately, my HTML presentation would not run offline. The lack of internet was giving me troubles with missing JavaScript files, missing fonts, images-urls to be replaced with path to static files, broken hyperlinks etc … it was not as easy as I thought.

It took more than 3 hours to fix all the bugs on account of a really slow internet connection running from my phone, but finally i got my presentation perfectly  running off line on the AGU computers !

In the end, my talk ran very smoothly. A complete workflow for “catchments characterization” using exclusively open source software, running online and fully reproducible thanks to the use of open source software and an open dataset! I felt really good, as I think I successfully got my message across, both in words and in actions.

To top it all off, my presentation came just at the right time. Before me, two other presentations during my session had mentioned the use of the IPython Notebook as open source software tool to enable reproducibility of scientific work. They had highlighted that it shows great potential and that it deserves further investigation. I think my presentation gave them even more proof of that! Even the chairman acknowledged this when he stated: “Before we heard about it, but now we saw it in action!” I felt very proud of what I had done. The effort I put into running the HTML slideshow definitely paid off!!!


VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Biomedical Semantics and the Cloud

November 18th, 2011

I’ve been asked to give a 30 minute talk on biomedical semantics in the cloud at the Molecular Med Tri Con in the symposium on cloud computing. Here’s what I know about what’s going on in this area at the moment:

So that’s on the “semantics using the cloud” side, but I really think that there’s a lot of potential going the other way: using semantics to discover data and services in the cloud. SADI has the ability to discover and link services through ontologies. It’s similar to SAWSDL (in fact, they wrap SAWSDL services), but they don’t bother with the extra layer, and just let the service process RDF directly. When SADI services are deployed to the cloud, it’ll solve a big problem for people who want others to use their services/algorithms without the overhead of maintaining those servers themselves. In fact, with the Amazon DevPay structure, it’s possible for small labs to release datasets, databases, and algorithms to the world and not have to pay to support it.

I say when, not if, because my implementation of SADI in Python is almost ready for deployment through Google App Engine (which can be deployed in AWS or other systems using AppScale), and from what I hear, it won’t take much work to do the same with the Java implementation. Between this and the extreme portability of python SADI services (it’s just a script), use in the cloud and redeployment to private clouds is going to be trivial.

So I’m asking folks, am I full of it? Also, what else is there out there? Please help me out so that we all get some good exposure!

VN:F [1.9.22_1171]
Rating: 8.3/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)

Multi-Word TagCloud on Web N-gram Now

April 29th, 2010

Check out the tagCloud below, can you see why it is interesting? Please compare the two tag clouds generated from the same text (a text corpus from the title of about 2000 datasets), and see why they are different.

A Multi-word TagCloud produced from 2000 US gov dataset titles

Novel Multi-word TagCloud

Conventional Single-word Tag Cloud

Conventional Single-word TagCloud


  • Meaningful Visualization. As you may see from the caption, the first one a “MultiWord TagCloud” while the other is the conventional single-word  TagCloud. The former joints individual words into popular multi-word phrases. With the former tag cloud, I can have a better overview on what data was published at
  • Automated Process. The MultiWord TagCloud was not created by human users, but automatically generated by computer program, powered by Microsoft Web N-gram service. We can generate such tag cloud for all existing text document
  • Cloud+Crowd. Broadly, this demo shows the value of the crowd and the cloud I mentioned in my earlier blog, now big data can be tackled by the crowd (text from the entire Web) and the cloud (the high performance computational Web N-gram service).

Behind the Scene

The WWW2010 is really inspiring – making me a productive “engineer” although I came as a researcher. Today I picked up Microsoft Visual Studio and write my first C# program. I was an excellent C++ programmer back to my college time (I wrote ton of code using Visual C++ 4.0 10+ years ago). However, today is not about me being a programmer, but rather announce something that is really cool!  I would also like to thank researchers, Evelyne and Paul from Microsoft Research for their great support. My demo on data is powered by Microsoft Web N-gram Service.


Li Ding @ RPI,  April 29, 2010

VN:F [1.9.22_1171]
Rating: 6.5/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: cloud computing, linked data Tags: