Check out the tagCloud below, can you see why it is interesting? Please compare the two tag clouds generated from the same text (a text corpus from the title of about 2000 data.gov datasets), and see why they are different.
- Meaningful Visualization. As you may see from the caption, the first one a “MultiWord TagCloud” while the other is the conventional single-word TagCloud. The former joints individual words into popular multi-word phrases. With the former tag cloud, I can have a better overview on what data was published at data.gov.
- Automated Process. The MultiWord TagCloud was not created by human users, but automatically generated by computer program, powered by Microsoft Web N-gram service. We can generate such tag cloud for all existing text document
- Cloud+Crowd. Broadly, this demo shows the value of the crowd and the cloud I mentioned in my earlier blog, now big data can be tackled by the crowd (text from the entire Web) and the cloud (the high performance computational Web N-gram service).
Behind the Scene
The WWW2010 is really inspiring – making me a productive “engineer” although I came as a researcher. Today I picked up Microsoft Visual Studio and write my first C# program. I was an excellent C++ programmer back to my college time (I wrote ton of code using Visual C++ 4.0 10+ years ago). However, today is not about me being a programmer, but rather announce something that is really cool! I would also like to thank researchers, Evelyne and Paul from Microsoft Research for their great support. My demo on data.gov data is powered by Microsoft Web N-gram Service.
Li Ding @ RPI, April 29, 2010