Big Data

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found.

Reasoning and querying over data streams rely on the ability to deliver a sequence of stream snapshots to the processing algorithms.

These snapshots are typically provided using windows as views into streams and associated window management strategies.

In the context of Smart Cities, indicator definitions have been used to calculate values that enable the comparison among different cities.


Earth Science Informatics, Special Issue - Semantic e-Science

Guest Editors:

We study the allocation problem of investor attention, the scarce and important economic resource in the information era, through practitioners' tweeting behaviors.

In the complex data curation activities involving proper data access, data use optimization and data rescue, opportunities exist where underlying skills in semantics may play a crucial role in data curation professionals ranging from data scientists, to informaticists, to librarians.

In December 2010 the International Open Government Dataset Search (IOGDS) team at the Tetherless World Constellation at Rensselaer Polytechnic Institute embarked on a project to discover, document and analyze open data catalogs published by governments at various levels around the world.

Combining statistical techniques with semantic data representations holds the potential to enhance understandability of scientific results.

Learn about: preparation of informative, manageable datasets; accessing "big data" quickly and reliably during and subsequent to analysis; data pre-processing, analytics methods selection and testing, work flow design, and bulk data processing; exploratory data analysis including interpretation, generation of hypotheses and intuition about the data; prediction, utilizing statistical tools such as regression, classification, and clustering; communication of results through visualization, stories, and interpretable summaries.