GeoData 2014 in sunny “People’s Republic of Boulder”

June 25th, 2014

I was so excited to see so many domain scientists as well as data specialists getting together talking about one common topic — data science, which gave me a great chance to communicate and to learn from others. We had people from NOAA, NASA, USGS, RDA, UCAR, universities, industries, as well as many other organizations and agencies.

I have to say that Boulder is such a wonderful small town to stay at. A large amount of scientific government agencies are located there. Local people are very friendly, and from any spot of the town you could have a wonderful view of the beautiful mountains on the west side. Moreover, it is said that there are around 300 sunny days per year over there! No wonder why Patrick and Stephan love this place so much and prefer not to stay at Troy all the time, lol~

The first important thing I have learned from the workshop is about data policies. I have to admit that as a student I never cared about data policies in the real world, which in fact always exist and should be obeyed. Regulations have been made about data citations, making scientific data much easier to be accessed and reused. This has really broadened my horizon a lot and gave me a chance to get prepared for the real world challenges. However, there are still much more we need to do about data policies. For example, there are both federal and university scientists working on similar scientific data. Sometimes it is not so easy to make the policies work for these different groups. Coordinating all entities is difficult. Some places are really running with it while others are left behind.

During one break, Prof. Fox reminded me that I should walk around and mingle with people instead of sitting there alone. This was indeed great advice for me. Research sometimes needs great amount of communication. You never know how much others could help with your own work and how much others could also learn from your research achievements. This is also a main purpose for the whole workshop, which is to stimulate academic and agency collaboration in geoinformatics and geodata retrieving, integrating, reusing and citing.

During one lunch break I happened to sit with a guy who conducted time-series hydrologic data visualizations. He used some simple 2-D grids to visualize the data (x-axis as days in a year, y-axis as years, each small grid as one data point in a particular day in the history, different colors as different values). Compared to a whole bunch of plots as well as hard-to-read 3-D visualizations, I was very surprised to see that such a simple idea could reveal much more conclusions from the data. This is just one simple example of how important communication and collaboration are.

Additionally, we still have a long way to go to transfer from the age of relational database to semantic triple store. Nobody could argue that triple store does a much better job than the relational database on the “linked data” aspect. However, it also costs much more on the maintenance side. So in the real world, especially in the industry, people still prefer not to use it since the main goal of business is to make profits. However, I believe that it will change soon in the near future, starting from our domain experts who participated in this workshop.

According to the feedback of a breakout session, a large amount of people are still confused about the different between ontology and vocabulary. Data-related education is another key issue we discussed during the workshop. I felt so lucky that I had the chance to learn data-related knowledge systematically from our tetherless world professors. However, there is still a big challenge to make the whole data community realize how important it is to make data management plans, to carry out data citations, and so on. It is everyone’s responsibility to create an even bigger and better “open data” community.

