Bringing Data Science, Xinformatics and Semantic eScience into the Graduate Curriculum

Printer-friendly version

Authors:Peter Fox


Recent advances in acquisition techniques quickly provide massive amount of complex data characterized by source heterogeneity, multiple modalities, high volume, high dimensionality, and multiple scales (temporal, spatial, and function). In turn, science and engineering disciplines are rapidly becoming more and more data driven with goals of higher sample throughput, better understanding/modeling of complex systems and their dynamics, and ultimately engineering products for practical applications. However, analyzing libraries of complex data requires managing its complexity and integrating the information and knowledge across multiple scales over different disciplines.

Attention to Data Science is now ubiquitous - The Fourth Paradigm publication, Nature and Science special issues on Data, and explicit emphasis on Data in national and international agency programs, foundations (Keck, Moore) and corporations (IBM, GE, Microsoft, etc.). Surrounding this attention is a proliferation of studies, reports, conferences and workshops on Data, Data Science and workforce. Examples include: “Train a new generation of data scientists, and broaden public understanding” from an EU Expert Group, “…the nation faces a critical need for a competent and creative workforce in science, technology, engineering and mathematics (STEM)...”, "We note two possible approaches to addressing the challenge of this transformation: revolutionary (paradigmatic shifts and systemic structural reform) and evolutionary (such as adding data mining courses to computational science education or simply transferring textbook organized content into digital textbooks).”, and “The training programs that NSF establishes around such a data infrastructure initiative will create a new generation of data scientists, data curators, and data archivists that is equipped to meet the challenges and jobs of the future."

Further, interim report of the International Council for Science's (ICSU) Strategic Coordinating Committee on Information and Data (SCCID), features this excerpt from section 4.2.4 Data scientists and professionals: "An unfortunate state in the recognition of data science, is that there is a lack of appreciation of the need for a set of professional knowledge in skill in key areas, many of which have not been emphasized to date, e.g. professional approaches to the management of data over its lifecycle. As such, the effort required to be a data scientists is not valued sufficiently by the remainder of the scientific community." SCCID Recommendation 6 reads: “We recommend the development of education at university level in the new and vital field of data science. The curriculum included in appendix D can be used as a starting point for curriculum development. Appendix D. is entitled “Example curriculum for data science” and explicitly uses the “Curriculum for Data Science taught at Rensselaer Polytechnic Institute, USA” .

This contribution will present relevant curriculum offerings at the Rensselaer Polytechnic Institute.


DateCreated ByLink
April 25, 2012
Peter FoxDownload

Related Research Areas:

Data Science
Lead Professor: Peter Fox
Description: Science has fully entered a new mode of operation. Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.

At present, there is a lack formal training in the key cognitive and skill areas that would enable graduates to become key participants in escience collaborations. The need is to teach key methodologies in application areas based on real research experience and build a skill-set.

At the heart of this new way of doing science, especially experimental and observational science but also increasingly computational science, is the generation of data.

Semantic eScience
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.

Lead Professor: Peter Fox
Description: In the last 2-3 years, Informatics has attained greater visibility across a broad range of disciplines, especially in light of great successes in bio- and biomedical-informatics and significant challenges in the explosion of data and information resources. Xinformatics is intended to provide both the common informatics knowledge as well as how it is implemented in specific disciplines, e.g. X=astro, geo, chem, etc. Informatics' theoretical basis arises from information science, cognitive science, social science, library science as well as computer science. As such, it aggregates these studies and adds both the practice of information processing, and the engineering of information systems.
Concepts: ,