Data Science Concept

Printer-friendly version

Description:

Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.

Projects:
DCO-DS LogoDeep Carbon Observatory Data Science (DCO-DS)
Principal Investigator: Peter Fox
Co Investigator: John S. Erickson and Jim Hendler
Description: Given this increasing data deluge, it is clear that each of the Directorates in the Deep Carbon Observatory face diverse data science and data management needs to fulfill both their decadal strategic objectives and their day-to-day tasks. This project will assess in detail the data science and data management needs for each DCO directorate and for the DCO as a whole, using a combination of informatics methods; use case development, requirements analysis, inventories and interviews.
DTDI Project LogoDeep Time Data Infrastructure (DTDI)
Principal Investigator: Peter Fox
Description: Earth’s living and non-living components have co-evolved for 4 billion years through numerous positive and negative feedbacks. Yet our ability to document, model, and explore these complex intertwined changes has been hampered by a lack of data synthesis and integration from many complementary disciplines—mineralogy, petrology, paleobiology, geochronology, proteomics, geochemistry, and more. The rise of oxygen exemplifies the co-evolution of rocks and life, and underscores both the tantalizing opportunities and technical challenges of deciphering transient characteristics of Earth’s storied past.
EAGER Project LogoEAGER: Semantic Search (EAGER)
Principal Investigator: Jim Hendler
Description: NSF EAGER project to explore advanced semantic technology for data search.
Health Data Challenge (HealthData)
Principal Investigator: Jim Hendler and Deborah L. McGuinness
Co Investigator: Kristine Gloria, Alvaro Graves, Tim Lebo, and James McCusker
Description: An infrastructure for large-scale collaboration around aggregation, generation, and publication of health-related Linked Data.
MBVL Project LogoMarine Biodiversity Virtual Laboratory (MBVL)
Principal Investigator: Heidi Sosik, Stace Beaulieu, David Mark Welch, and Peter Fox
Description: This research effort brings together computational and information scientists, oceanographers and microbiologists to develop a Marine Biodiversity Virtual Laboratory (MBVL). In addition to research investigations of marine ecosystems, the Virtual Laboratory provides a platform for education via student diversity programs at the three institutions. The important learning opportunities will be two-fold for students: (1) to learn about, model, and make predictions for biodiversity in natural systems, and (2) to be exposed to working in an interdisciplinary team that includes both natural scientists and computer scientists.
NOCV Project LogoNational Ocean Council Vocabulary (NOCV)
Description: The objective of the NOCV project is to demonstrate technical capabilities that are available and can be deployed to implement solutions to key needs identified in the National Ocean Policy in regard to data and the decision-support requirements that arise from data-oriented questions.
Repurposing Drugs with Semantics (ReDrugS)
Principal Investigator: Jonathan Dordick and Deborah L. McGuinness
Description: We aim to find new effective treatments for disease using existing drugs. Our approach is to gather and integrate existing data using semantic technologies to help discover promising drug repurposing.
SSIII LogoSemantic Sea Ice Interoperability Initiative (SSIII)
Principal Investigator: Mark Parsons, Siri Jodha Singh Khalsa, and Ruth Duerr
Co Investigator: Peter Fox and Deborah L. McGuinness
Description: SSIII is a National Science Foundation (NSF) funded effort to enhance the interoperability of sea ice data to establish a network of practitioners working to enhance semantic interoperability of all Arctic data. SSIII is a collaborative project between NSIDC and the Rensselaer Polytechnic Institute (RPI) Tetherless World Constellation project. We seek to build on the work initiated under the International Polar Year (IPY) and create a community of practice working to improve interoperability within the Polar Information Commons (PIC), the Sustained Arctic Observing Network (SAON), and broader global systems.
S2S Project LogoSemantically Enabled Facetd Browser (S2S)
Principal Investigator: Peter Fox
Co Investigator: Stephan Zednik
Description: S2S is a user interface framework that leverages the machine-readable semantics of data, services, and user interface components, or "widgets". S2S automates various tasks in UI development for search interfaces.
SEMMDD LogoSemantically Enabled Modeling of Major Depressive Disorder (SEMMDD)
Principal Investigator: Joanne S. Luciano
Description: In this project, we study the effects of how different antidepressant treatments, including non-pharmacological treatments, affect the underlying brain regions, clinical symptoms, and behaviors. We use mathematical modeling and computer simulation to combine clinical research with neuroscience research.
TWC schema.org Project LogoTWC Schema.org Vocabulary Development (TWC_Schemas)
Principal Investigator: Jim Hendler
Co Investigator:
Description: schema.org provides a collection of schemas — html tags — that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Since early 2012 researchers at TWC RPI have been working with government and research data providers to define vocabularies for expressing the structured data that powers their web sites, using on-page markup based on schema.org vocabularies. In particular, we developed the schema.org/Dataset extension, a concise vocabulary that extends schema.org for describing datasets and data catalogs. Current work includes applying Dataset to scientific datasets and developing new extensions for use by Web Observatories
TW LogoTWC Web Observatory (WebObservatory)
Principal Investigator: Deborah L. McGuinness
Co Investigator: Jim Hendler
Description: The Web Science Research Center at TWC RPI is working with other members of the Web Science Trust to create a global "Web Observatory". The global movement toward Open Data and transparency have successfully motivated the release of very large institutional and commercial data sets describing social phenomena, economic indicators and geographic trends. This proliferation of data represents great opportunity for researchers and industry but this data abundance also threatens to make it ever more difficult to locate, analyse, compare and interpret useful information in a consistent and reliable way; a situation which can only get worse unless we can help stakeholders perform useful analysis rather than drowning in a sea of data. A global Web Observatory will offer an institutional framework to promote the use of W3C and other standards in the development of Semantic Catalogues to globally locate existing data sets, Collection Systems to gather new global data sets, and Analytics Tools and methodologies to analyse these data sets.
HADATAC LogoThe Human-Aware Data Acquisition Framework (HADatAc)
Principal Investigator: Paulo Pinheiro
Co Investigator: Deborah L. McGuinness
Description:
ToolMatch LogoToolMatch (ToolMatch)
Description: or a given dataset, it is difficult to find the tools that can be used to work with the dataset. In many cases, the information that Tool A works with Dataset B is somewhere on the Web, but not in a readily identifiable or discoverable form. In other cases, particularly more generalized tools, the information does not exist at all, until somebody tries to use the tool on a given dataset. Thus, the simplest, most prevalent use case is for a user to search for the tools that can be used with a given dataset. A further refinement would be to specify what the tool can do with the dataset, e.g., read, visualize, map, analyze, reformat.
People:
Stephan Zednik

Stephan Zednik is a Senior Software Engineer with the Tetherless World Constellation at Rensselaer Polytechnic Institute. His research interests include researcher collaboration networks, quality representation and semantics, and provenance representation from data science tools. Stephan partici [...]