TWeD Talk: Text Analysis of Large Metadata Catalogs

Printer-friendly version

TWeD Talk: Text Analysis of Large Metadata CatalogsNovember 18, 2013
There's always something happening on Wednesday evenings in the Tetherless World!

TWeD Talk, Wednesday, November 20, 2013, 7pm ET, Winslow Building on the RPI Campus

Please join us as TWC Ph.D. student Amar Viswanathan leads us through a discussion and demo of tools and methods used for analyzing large dataset catalogs!

ABSTRACT: We will demonstrate the application of traditional IR methods including entity extraction, tf-idf and (if time permits) topic modelling on large collections of metadata such as the International Open Government Dataset Catalog (IOGDS) --- over 1M datasets --- and the visualization of these results. The focus of this talk will be on demonstrating how to use certain simple tools to generate results and produce quick visualizations, including word clouds and graphs. We will also discuss how the kinds of analysis performed on IOGDS including languages, categories, and keywords maybe used as source data for a question answering system like IBM's Watson.