Big Data

Printer-friendly version


Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found.
See Also

EAGER Project LogoEAGER: Semantic Search (EAGER)
Principal Investigator: Jim Hendler
Description: NSF EAGER project to explore advanced semantic technology for data search.
SemantEco Annotator Project LogoSemantEco Annotator
Principal Investigator: Deborah L. McGuinness
Co Investigator: Patrice Seyed
Description: Generating useful RDF linked data is not a straightforward process for scientists using today's tools. In this project we introduce the SemantEco Annotator, a semantic web application that leverages community-based vocabularies and ontologies during the translation process itself to ease the process of drawing out implicit relationships in tabular data so that they may be immediately available for use within the LOD cloud. Our goal for the SemantEco Annotator is to make advanced RDF translation techniques available to the layperson.
TW LogoSemantic Data Dictionaries (SDD)
Principal Investigator: Deborah L. McGuinness
Co Investigator: James McCusker
Description: A methodology building on existing data dictionaries in order to describe entities, attributes, and relationships in data sets through the Semanticscience Integrated Ontology (SIO) and relevant domain ontologies. Semantic Data Dictionaries are being developed in support of other projects, including CHEAR.
SPP Project LogoSocial Practices (SPP)
Principal Investigator: Jim Hendler
Description: The overall goal of this project is to explore and establish a better understanding of privacy in this highly-networked world. This page features the tools and workflow needed to accomplish such a task. We argue that while much has been written and discussed about privacy in various domains (e.g., law, psychology, economic behavior, security, etc.), it remains unclear what exactly is the privacy problem? Our aim is to reframe our own understanding of privacy by moving away from these traditional disjointed compartments of knowledge. Moreover, given the complexity, we advocate this research question as an exemplar for the value of combining efforts between human and machine. This project features tools, workflow(s) and best practices we've developed and implemented to accomplish such a task. This is and will be a work in progress. Any comments and or feedback are welcomed. Please email Kristine Gloria at for more information.
TW LogoTWC Web Observatory (WebObservatory)
Principal Investigator: Deborah L. McGuinness
Co Investigator: Jim Hendler
Description: The Web Science Research Center at TWC RPI is working with other members of the Web Science Trust to create a global "Web Observatory". The global movement toward Open Data and transparency have successfully motivated the release of very large institutional and commercial data sets describing social phenomena, economic indicators and geographic trends. This proliferation of data represents great opportunity for researchers and industry but this data abundance also threatens to make it ever more difficult to locate, analyse, compare and interpret useful information in a consistent and reliable way; a situation which can only get worse unless we can help stakeholders perform useful analysis rather than drowning in a sea of data. A global Web Observatory will offer an institutional framework to promote the use of W3C and other standards in the development of Semantic Catalogues to globally locate existing data sets, Collection Systems to gather new global data sets, and Analytics Tools and methodologies to analyse these data sets.
HADATAC LogoThe Human-Aware Data Acquisition Framework (HADatAc)
Principal Investigator: Paulo Pinheiro
Co Investigator: Deborah L. McGuinness