Deep Carbon Observatory Data Science

Printer-friendly version


Research Areas: Data Science, X-informatics
Principal Investigator: Peter Fox
Co Investigator: John S. Erickson and Jim Hendler
Concepts: eScience, Data Management, Data Visualization, Data Science, Geophysical Science
Recent advances in data generation techniques, whether by experiments, measurements or computer simulation, quickly provide complex data characterized by source heterogeneity, multiple modalities, often high volume, high dimensionality, and multiple scales (temporal, spatial, and function). In turn, science and engineering disciplines are rapidly becoming more and more data driven by a variety of goals (the Deep Carbon Observatory is an exemplar); higher sample throughput, high resolution, additional physics/ chemistry/ biology, new instrumentation, and new integrated databases all with the ultimate aim of better understanding/modeling of the complex systems and their dynamics that underlie the processes being studied. However, analyzing libraries of complex data requires managing the inherent complexity to allow integration of the information and knowledge across multiple scales and spanning traditional disciplinary boundaries. Significant advances in methods, tools and applications for data science and informatics over the last five years can now be applied to multi- and inter-disciplinary problem areas. Virtual Observatories, Virtual Organizations, complex networks, linked data across systems, full life cycle data management, data integration, citation and attribution are now increasingly becoming an integral part of projects whether small (few people, one organization, modest data needs) or the very large (many investigators, organizations, diverse data needs).

Given this increasing data deluge, it is clear that each of the Directorates in the Deep Carbon Observatory face diverse data science and data management needs to fulfill both their decadal strategic objectives and their day-to-day tasks. This project will assess in detail the data science and data management needs for each DCO directorate and for the DCO as a whole, using a combination of informatics methods; use case development, requirements analysis, inventories and interviews.