Data Science Day Symposium
It was my first time to attend a Data Science workshop since I began to work for the DCO project. I must say that it was great experience to mingle with scientists and scholars in different domains from all around the world. I felt very proud to see researchers from far away gathering together at our beautiful RPI campus to have conversations about Data Science.
We have done lots of wonderful work to build the Data Science and Management infrastructures for the DCO. Now it is the time to convince our DCO colleagues why it is so important to share “linked data” and “open data” for the whole DCO community.
During the break out session in the afternoon, I joined the EPC group. Dr. Mark Ghiorso led our discussion. Currently a lot of members in DCO haven’t realized how important it is to make a management plan to preserve the data generated in their research. One of the questions which has been raised was that what types and formats of data do you produce or use in your work, and how do you archive them. This question will lead to one of our DCO-DS boundary activities. Since some of the research publications are too old, there are no electronic versions existing at all. If we need to reuse the data from these published literatures, we have to figure out a way to regenerate data from the paper versions. OCR (optical character recognition) technique could be used to transfer images into machine-readable text after scanning the paper versions. Then one key problem we need to solve is how to extraction the metadata and data we need automatically from the text while maintaining the quality control. We still have a long way to go regarding this.
In the meantime, the fact that we have already lost a lot of valuable data in the history clearly shows why it is so crucial to make a good data management plan before conducting any research activities in the future. The Data Science and Management infrastructures our data science built for DCO plays a very significant role on this, and we also managed to make the portals user-friendly.
During the second day of our workshop, we were very glad to meet a lot of DCO researchers who are willing to share “linked data” via our drupal-VIVO-CKAN Data Science and Management infrastructures. We believe that after this workshop more and more researchers will make use of our data science platforms which could benefit the whole DCO community!
Congrui Li, DCO-DS team member