Geoscience in the Web era – a few facets

July 30th, 2014

In middle July 2014 I attended the DCO summer school at Big Sky Resort, MT, with a 2-day field trip at Yellowstone National Park (YNP) – a nice experience – the venue is wonderful, and also the topics covered by the curriculum. But what impressed me the most is to see how the Web brings changes to geoscience works as well as geoscientists.

We have three excellent field trip guides, Lisa Morgan, Pat Shanks and Bill Inskeep. They prepared and distributed a 82-page YNP field trip guide! Of course they first shared it online through Dropbox. What also impressed me is that when I showed my golden spike information portal to Lisa, she also showed me a few APPs on her iPhone with state geologic map services – useful gadget for field work. But our field trip experience in YNP showed that a paper map is still necessary as it is bigger and provides a overview of a wider area, and it needs no battery.

The YNP itself has a virtual observatory website called Yellowstone Volcano Observatory, hosted by USGS and University of Utah. The portal provides “timely monitoring and hazard assessment of volcanic, hydrothermal, and earthquake activity in the Yellowstone Plateau region.” Featured information includes publications, online mapping services, and also images, videos and webcams about YNP.

I was happy to see that Katie Pratt and I are accompanied by many other summer school participants when we were tweeting on Twitter. Search the hashtag #DCOSS14 you will find how active the participants were on Twitter during the period of the summer school. I was even a little surprise to see that Donato Giovannelli ‏@d_giovannelli helped answer a question about twitter impact on citation by pasting the link to a paper, a few seconds after I gave a short introduction to the Altmetric.com and its use in Nature Publishing Group, Springer and Wiley.

And my role at the summer school was two-fold: participant and lecturer. I gave a presentation titled ‘Why data science matters and what we can do with it‘, in which I addressed four sub-topics: data management and publication, interoperability of data, provenance of research, and era of Science 2.0. The slides are accessible on Slidershare [link].

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Author: Categories: tetherless world Tags:

Notes on public talks

July 16th, 2014

Massimo and I worked together on two posters about automatic provenance capturing for research publications and we won the ESIP FUNding Friday award. What left unforgettable to me, however, is the great lesson I learnt from giving the 2 minute pitch in front of the ESIP folks.

During the 2 minutes talk, I just could not help staring at the two posters we printed and made on the day before and that morning. Now I know the reason — it’s because I only practiced my speech with one of the posters displayed on my laptop. For the other poster, I have no chance to practice talking about it at all. I became dependent on the presence of the posters in front of me and cannot make the talk in front of people, instead of posters.

Possible solutions to make my eyes move away from the posters when talking? The best I thought of is to get REALLY familiar with the topic I’m gonna present — at least so familiar that I don’t need to look at any auxiliary facility such as a poster to remind myself what to say, better if being able to save some spare attention for the audience — to receive their feedback and adjust accordingly in real time. The need to ignore the audience for a while to concentrate on “what should I say here?” indicates that I’m not familiar enough with the topic.

In addition to the content, presenters also need to get familiar with the way of presenting the content. This could include scrutinizing the practice talk sentence by sentence to make sure “I said what I meant and I meant what I said”. Not until such clarity and confidence are reached can one start thinking about all the fancy stuff like speaking pace, volume variations and eye contacts with audience. Well, those are fancy to me, not necessarily for good speakers.

So there is really a lot to work on for a public talk, especially if it’s the first time for the presenter to talk about the idea. The work is so much that it cannot be done over the night before the talk. We need to work on the familiarity, clarity and confidence of our ideas on a daily basis. It helps to write down what we mean and talk about it often.

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Geodata 2014

June 30th, 2014

A few weeks ago I attended the 2014 Geodata Workshop. Like the previous Geodata workshop in 2011, this workshop was focused on discussing policies and techniques to improve inter-agency geographic data integration and data citation. While there have been advances in recommendations for data citation and geodata integration since the last Geodata workshop, I felt the mood of the attendees indicated that we are now in much the same place we were in 2011. There was strong consensus as to the importance of data citation and integration, but a feeling that no one is really doing it at scale, the tools aren’t where we need them to be, and the agency policies are not yet at a state to successfully drive widespread adoption. Despite these hurdles this is a community that is clearly excited and willing to take the first steps towards making widespread data integration and data citation a reality in the geodata community.

Meanwhile, in the trenches…

I had several conversations with attendees who represent publishers of oceanographic vocabularies. Many of these vocabularies have been publicly available for several years, but have been traditionally been 3-star open data (publicly available in a non-proprietary machine-readable format, no links to external vocabularies). These publishers are excited about upgrading their vocabulary services to be 5-star open data (use open W3C standards such as RDF/SPARQL, identify things with resolvable URIs, link to other people’s data) because they see a major benefit in being able refer to the authoritative source for a term or identified resource that is related to their vocabulary but for which they are not the authoritative source. This is a great example of a group that has already identified a specific real-world need and benefit from integration and who are actively laying the groundwork that will enable that integration to be successful. This group was enthusiastic about cross-linking their vocabluaries and I have no doubt their efforts will be viewed as a data integration success at the next Geodata workshop.

Where we can help…

As a result of these discussions our lab is starting a Linked Vocabulary API effort whose goal is to provide a Linked Data API configuration specialized to the purpose of publishing SKOS vocabularies. Our goal is to develop a configuration that makes bootstraping a RESTful linked data API to a SKOS vocabulary simple and accessible for the broad scientific community.  This effort is based on work we previously did for the CMSPV project.

In conclusion

What I will remember most from Geodata 2014 is the excitment members of the community had towards adopting new technologies and techniques and making widespread data integration and citation a reality. Where conventions have yet to be established the community is willing to take the first steps and establish best practices.  Where policies have yet to be formalized the community is ready to work with policy makers to ensure clear and helpful policies are established .  Whenever the next Geodata workshop is held, I am confident that it’s narrative will be full of success stories that began at the 2014 workshop.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

GeoData 2014 in sunny “People’s Republic of Boulder”

June 25th, 2014

I was so excited to see so many domain scientists as well as data specialists getting together talking about one common topic — data science, which gave me a great chance to communicate and to learn from others. We had people from NOAA, NASA, USGS, RDA, UCAR, universities, industries, as well as many other organizations and agencies.

I have to say that Boulder is such a wonderful small town to stay at. A large amount of scientific government agencies are located there. Local people are very friendly, and from any spot of the town you could have a wonderful view of the beautiful mountains on the west side. Moreover, it is said that there are around 300 sunny days per year over there! No wonder why Patrick and Stephan love this place so much and prefer not to stay at Troy all the time, lol~

The first important thing I have learned from the workshop is about data policies. I have to admit that as a student I never cared about data policies in the real world, which in fact always exist and should be obeyed. Regulations have been made about data citations, making scientific data much easier to be accessed and reused. This has really broadened my horizon a lot and gave me a chance to get prepared for the real world challenges. However, there are still much more we need to do about data policies. For example, there are both federal and university scientists working on similar scientific data. Sometimes it is not so easy to make the policies work for these different groups. Coordinating all entities is difficult. Some places are really running with it while others are left behind.

During one break, Prof. Fox reminded me that I should walk around and mingle with people instead of sitting there alone. This was indeed great advice for me. Research sometimes needs great amount of communication. You never know how much others could help with your own work and how much others could also learn from your research achievements. This is also a main purpose for the whole workshop, which is to stimulate academic and agency collaboration in geoinformatics and geodata retrieving, integrating, reusing and citing.

During one lunch break I happened to sit with a guy who conducted time-series hydrologic data visualizations. He used some simple 2-D grids to visualize the data (x-axis as days in a year, y-axis as years, each small grid as one data point in a particular day in the history, different colors as different values). Compared to a whole bunch of plots as well as hard-to-read 3-D visualizations, I was very surprised to see that such a simple idea could reveal much more conclusions from the data. This is just one simple example of how important communication and collaboration are.

Additionally, we still have a long way to go to transfer from the age of relational database to semantic triple store. Nobody could argue that triple store does a much better job than the relational database on the “linked data” aspect. However, it also costs much more on the maintenance side. So in the real world, especially in the industry, people still prefer not to use it since the main goal of business is to make profits. However, I believe that it will change soon in the near future, starting from our domain experts who participated in this workshop.

According to the feedback of a breakout session, a large amount of people are still confused about the different between ontology and vocabulary. Data-related education is another key issue we discussed during the workshop. I felt so lucky that I had the chance to learn data-related knowledge systematically from our tetherless world professors. However, there is still a big challenge to make the whole data community realize how important it is to make data management plans, to carry out data citations, and so on. It is everyone’s responsibility to create an even bigger and better “open data” community.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Conceptual model of a workshop

June 24th, 2014

In June 2014 I helped organize two workshops, the DCO Data Science Day 2014 and the GeoData 2014. The experience was unique and I thought it is necessary to write down some notes for future events. Hope it also be useful to other people who are planning to organize a workshop or small conference.
The list of models below is following the idea of an ontology spectrum.

Model 1 (via Bruce Caron, easy and impressive): people, coffee, beer + shaking well.

Model 2 (following the context model of 5W1H): date, topic, location, people, agenda, logistics.

Model 3 (things to do – result of a brainstorm):
0 website;
1 date;
2 central topic, purpose, output;
3 topic of sessions, preferred topic of invited talks, topic of panels, topic of breakouts;
4 meeting rooms, hotel, visa application support;
5 organizing committee, meeting chair, session chair, invited speaker, breakout moderator, note taker, technical assistant, workshop report writer;
6 handouts pack (agenda, badge, logistics memo);
7 logistics: announcement, wifi, power strips, emergency contact, projector, whiteboard and marker, remote access facility, alcohol service permission, travel support, travel agency, dietary requirement, morning and afternoon break, lunch, dinner, reception, local transportation, reimbursement method.

Model 4 (following a timeline):
0 Science: topic, purpose;
1 Finance: meeting budget;
2 Planning: meeting proposal, organizing committee, logistics administrator, organizing meetings, date, location, announcement;
3 Agenda: topic of sessions, preferred topic of invited talks, topic of panels, topic of breakouts, meeting rooms, meeting chair, session chair, invited speaker, breakout moderator, note taker, technical assistant;
4 Logistics: handouts pack (agenda, badge, reimbursement form, logistics memo), emergency contact, wifi, power strips, projector, whiteboard and marker, remote access facility, travel support, travel agency, hotel, visa application support, dietary requirement, morning and afternoon break, lunch, dinner, reception, alcohol service permission, local transportation, reimbursement method.
5 Output: online virtual community of attendees, workshop summary and recommendations, workshop report writer.

Model 5 (an ontology? ;-))
Should be something like:
twc:Workshop a prov:Activity.
twc:SessionChair a prov:Role.

Comments and complements are welcome!

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags: