The Day of Data Science
The DCO Data Science team hosted a two-day workshop in RPI to promote the data management and infrastructure we have been working very hard on for DCO. As part of the team, I attended the whole event and met many scientists and scholars from the DCO communities.
This is a very good opportunity for our team to understand how the DCO scientists deal with their data and what their data needs are. In the breakout session in the afternoon of the first day, I joined the DL-DE group, and had a good chat with them about data science. There were some discussions on the data management, and apparently most of them have a very loose “plan” for their data especially the raw ones. They produced spreadsheets out of the raw data, and left them in a poorly archived, hard-to-access way, and worst of all, they don’t have a habit of writing good data management plans, at least not the ones that they would stick to. But the good news is, they had already started to realize the benefit of qualified data management, and they were seeking help from us.
One feedback we got from the scientists about the DCO portal was that it was “rather low priority for people”. They said it did contain helpful information, but people just didn’t go there all the time. They need some sort of pop-up reminders that show up on the desktops such that they could see it without logging in on the portal.
Another feature the scientists would very much like to have was the cross dataset searching capability. For example, they were very interested in pulling up all the data about a certain concentration of hydrogen from multiple datasets and databases, and they weren’t able to do this because of the lack of compatible schemas across datasets. This is definitely something interesting but hasn’t got under our radar in our DCO-DS work. Digging into the domain data might not be something easy, but once we establish the model, the power of linked data will reveal itself.
These are just some of the things we talked about in the workshop. All in all, as more and more data get collected in the research activities, scientists have become aware that they really need to get these data in order such that they can make the most of them.On the other hand, we, the data science crew, still have a lot of work to make this world a better data place :-)