A Justification for Semantic Training in Data Curation Frameworks Development

In the complex data curation activities involving proper data access, data use optimization and data rescue, opportunities exist where underlying skills in semantics may play a crucial role in data curation professionals ranging from data scientists, to informaticists, to librarians. Here, We provide a conceptualization of semantics use in the education data curation framework (EDCF) [1] under development by Purdue University and endorsed by the GLOBE program [2] for further development and application. Our work shows that a comprehensive data science training includes both spatial and non-spatial data, where both categories are promoted by standard efforts of organizations such as the Open Geospatial Consortium (OGC) and the World Wide Web Consortium (W3C), as well as organizations such as the Federation of Earth Science Information Partners (ESIP) that share knowledge and propagate best practices in applications. Outside the context of EDCF, semantics training may be same critical to such data scientists, informaticists or librarians in other types of data curation activity. Past works by the authors have suggested that such data science should augment an ontological literacy where data science may become sustainable as a discipline. As more datasets are being published as open data [3] and made linked to each other, i.e., in the Resource Description Framework (RDF) format, or at least their metadata are being published in such a way, vocabularies and ontologies of various domains are being created and used in the data management, such as the AGROVOC [4] for agriculture and the GCMD keywords [5] and CLEAN vocabulary [6] for climate sciences. The new generation of data scientist should be aware of those technologies and receive training where appropriate to incorporate those technologies into their reforming daily works. References [1] Branch, B.D., Fosmire, M., 2012. The role of interdisciplinary GIS and data curation librarians in enhancing authentic scientific research in the classroom. American Geophysical Union 2013 Fall Meeting, San Francisco, CA, USA. Abstract# ED43A-0727 [2] http://www.globe.gov [3] http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf [4] http://aims.fao.org/standards/agrovoc [5] http://gcmd.nasa.gov/learn/keyword_list.html [6] http://cleanet.org/clean/about/climate_energy_.html

View Publication

Associated Projects

Recent advances in data generation techniques, whether by experiments, measurements or computer simulation, quickly provide complex data characterized by source heterogeneity, multiple modalities, often high volume, high dimensionality, and multiple scales (temporal, spatial, and function).

Citation