Archive for April 29th, 2010

Multi-Word TagCloud on Web N-gram Now

April 29th, 2010

Check out the tagCloud below, can you see why it is interesting? Please compare the two tag clouds generated from the same text (a text corpus from the title of about 2000 datasets), and see why they are different.

A Multi-word TagCloud produced from 2000 US gov dataset titles

Novel Multi-word TagCloud

Conventional Single-word Tag Cloud

Conventional Single-word TagCloud


  • Meaningful Visualization. As you may see from the caption, the first one a “MultiWord TagCloud” while the other is the conventional single-word  TagCloud. The former joints individual words into popular multi-word phrases. With the former tag cloud, I can have a better overview on what data was published at
  • Automated Process. The MultiWord TagCloud was not created by human users, but automatically generated by computer program, powered by Microsoft Web N-gram service. We can generate such tag cloud for all existing text document
  • Cloud+Crowd. Broadly, this demo shows the value of the crowd and the cloud I mentioned in my earlier blog, now big data can be tackled by the crowd (text from the entire Web) and the cloud (the high performance computational Web N-gram service).

Behind the Scene

The WWW2010 is really inspiring – making me a productive “engineer” although I came as a researcher. Today I picked up Microsoft Visual Studio and write my first C# program. I was an excellent C++ programmer back to my college time (I wrote ton of code using Visual C++ 4.0 10+ years ago). However, today is not about me being a programmer, but rather announce something that is really cool!  I would also like to thank researchers, Evelyne and Paul from Microsoft Research for their great support. My demo on data is powered by Microsoft Web N-gram Service.


Li Ding @ RPI,  April 29, 2010

VN:F [1.9.22_1171]
Rating: 6.5/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: cloud computing, linked data Tags: