Open Science in an Open World
I began to think about a blog for this topic after I read a few papers about Open Codes and Open Data published in Nature and Nature Geoscience in November 2014. Later on I also noticed that the editorial office of Nature Geoscience made a cluster of articles themed on Transparency in Science (http://www.nature.com/ngeo/focus/transparency-in-science/index.html), which really created an excellent context for further discussion of Open Science.
A few weeks later I attended the American Geophysical Union (AGU) Fall Meeting at San Francisco, CA. That is used to be a giant meeting with more than 20,000 attendees. My personal focus is presentations, workshops and social activities in the group of Earth and Space Science Informatics. To summarize the seven-day meeting experience with a few keywords, I would choose: Data Rescue, Open Access, Gap between Geo and Info, Semantics, Community of Practice, Bottom-up, and Linking. Putting my AGU meeting experience together with thoughts after reading the Nature and Nature Geoscience papers, now it is time for me to finish a blog.
Besides incentives for data sharing and open source policies of scholarly journals, we can extend the discussion of software and data publication, reuse, citation and attribution by shedding more light on both technological and social aspects of an environment for open science.
Open science can be considered as a socio-technical system. One part of the system is a way to track where everything goes and another is a design of appropriate incentives. The emerging technological infrastructure for data publication adopts an approach analogous to paper publication and has been facilitated by community standards for dataset description and exchange, such as DataCite (http://www.datacite.org), Open Archives Initiative-Object Reuse and Exchange (http://www.openarchives.org/ore) and the Data Catalog Vocabulary (http://www.w3.org/TR/vocab-dcat). Software publication, in a simple way, may use a similar approach, which calls for community efforts on standards for code curation, description and exchange, such as the Working towards Sustainable Software for Science (http://wssspe.researchcomputing.org.uk). Simply minting Digital Object Identifiers to codes in a repository makes software publication no difference from data publication (See also: http://www.sciforge-project.org/2014/05/19/10-non-trivial-things-github-friends-can-do-for-science/) . Attention is required for code quality, metadata, license, version and derivation, as well as metrics to evaluate the value and/or impact of a software publication.
Metrics underpin the design of incentives for open science. An extended set of metrics – called altmetrics – was developed for evaluating research impact and has already been adopted by leading publishers such as Nature Publishing Group (http://www.nature.com/press_releases/article-metrics.html). Factors counted in altmetrics include how many times a publication has been viewed, discussed, saved and cited. It was very interesting to read some news about funders’ attention to altmetrics (http://www.nature.com/news/funders-drawn-to-alternative-metrics-1.16524) on my flight back from the AGU meeting – from the 12/11/2014 issue of Nature which I picked from the NPG booth at the AGU meeting exhibition hall. For a software publication the metrics might also count how often the code is run, the use of code fragments, and derivations from the code. A software citation indexing service – similar to the Data Citation Index (http://wokinfo.com//products_tools/multidisciplinary/dci/) of Thomson Reuters – can be developed to track citations among software, datasets and literature and to facilitate software search and access.
Open science would help everyone – including the authors – but it can be laborious and boring to give all the fiddly details. Fortunately fiddly details are what computers are good at. Advances in technology are enabling the categorization, identification and annotation of various entities, processes and agents in research as well as the linking and tracing among them. In our 06/2014 Nature Climate Change article we discussed the issue of provenance of global change research (http://www.nature.com/nclimate/journal/v4/n6/full/nclimate2141.html). Those works on provenance capture and tracing further extend the scope of metrics development. Yet, incorporating those metrics in incentive design requires the science community to find an appropriate way to use them in research assessment. A recent progress is that NSF renamed Publications section as Products in the biographical sketch of funding applicants and allowed datasets and software to be listed (http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp). To fully establish the technological infrastructure and incentive metrics for open science, more community efforts are still needed.