Archive

Archive for the ‘tetherless world’ Category

Notes on public talks

July 16th, 2014

Massimo and I worked together on two posters about automatic provenance capturing for research publications and we won the ESIP FUNding Friday award. What left unforgettable to me, however, is the great lesson I learnt from giving the 2 minute pitch in front of the ESIP folks.

During the 2 minutes talk, I just could not help staring at the two posters we printed and made on the day before and that morning. Now I know the reason — it’s because I only practiced my speech with one of the posters displayed on my laptop. For the other poster, I have no chance to practice talking about it at all. I became dependent on the presence of the posters in front of me and cannot make the talk in front of people, instead of posters.

Possible solutions to make my eyes move away from the posters when talking? The best I thought of is to get REALLY familiar with the topic I’m gonna present — at least so familiar that I don’t need to look at any auxiliary facility such as a poster to remind myself what to say, better if being able to save some spare attention for the audience — to receive their feedback and adjust accordingly in real time. The need to ignore the audience for a while to concentrate on “what should I say here?” indicates that I’m not familiar enough with the topic.

In addition to the content, presenters also need to get familiar with the way of presenting the content. This could include scrutinizing the practice talk sentence by sentence to make sure “I said what I meant and I meant what I said”. Not until such clarity and confidence are reached can one start thinking about all the fancy stuff like speaking pace, volume variations and eye contacts with audience. Well, those are fancy to me, not necessarily for good speakers.

So there is really a lot to work on for a public talk, especially if it’s the first time for the presenter to talk about the idea. The work is so much that it cannot be done over the night before the talk. We need to work on the familiarity, clarity and confidence of our ideas on a daily basis. It helps to write down what we mean and talk about it often.

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Geodata 2014

June 30th, 2014

A few weeks ago I attended the 2014 Geodata Workshop. Like the previous Geodata workshop in 2011, this workshop was focused on discussing policies and techniques to improve inter-agency geographic data integration and data citation. While there have been advances in recommendations for data citation and geodata integration since the last Geodata workshop, I felt the mood of the attendees indicated that we are now in much the same place we were in 2011. There was strong consensus as to the importance of data citation and integration, but a feeling that no one is really doing it at scale, the tools aren’t where we need them to be, and the agency policies are not yet at a state to successfully drive widespread adoption. Despite these hurdles this is a community that is clearly excited and willing to take the first steps towards making widespread data integration and data citation a reality in the geodata community.

Meanwhile, in the trenches…

I had several conversations with attendees who represent publishers of oceanographic vocabularies. Many of these vocabularies have been publicly available for several years, but have been traditionally been 3-star open data (publicly available in a non-proprietary machine-readable format, no links to external vocabularies). These publishers are excited about upgrading their vocabulary services to be 5-star open data (use open W3C standards such as RDF/SPARQL, identify things with resolvable URIs, link to other people’s data) because they see a major benefit in being able refer to the authoritative source for a term or identified resource that is related to their vocabulary but for which they are not the authoritative source. This is a great example of a group that has already identified a specific real-world need and benefit from integration and who are actively laying the groundwork that will enable that integration to be successful. This group was enthusiastic about cross-linking their vocabluaries and I have no doubt their efforts will be viewed as a data integration success at the next Geodata workshop.

Where we can help…

As a result of these discussions our lab is starting a Linked Vocabulary API effort whose goal is to provide a Linked Data API configuration specialized to the purpose of publishing SKOS vocabularies. Our goal is to develop a configuration that makes bootstraping a RESTful linked data API to a SKOS vocabulary simple and accessible for the broad scientific community.  This effort is based on work we previously did for the CMSPV project.

In conclusion

What I will remember most from Geodata 2014 is the excitment members of the community had towards adopting new technologies and techniques and making widespread data integration and citation a reality. Where conventions have yet to be established the community is willing to take the first steps and establish best practices.  Where policies have yet to be formalized the community is ready to work with policy makers to ensure clear and helpful policies are established .  Whenever the next Geodata workshop is held, I am confident that it’s narrative will be full of success stories that began at the 2014 workshop.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

GeoData 2014 in sunny “People’s Republic of Boulder”

June 25th, 2014

I was so excited to see so many domain scientists as well as data specialists getting together talking about one common topic — data science, which gave me a great chance to communicate and to learn from others. We had people from NOAA, NASA, USGS, RDA, UCAR, universities, industries, as well as many other organizations and agencies.

I have to say that Boulder is such a wonderful small town to stay at. A large amount of scientific government agencies are located there. Local people are very friendly, and from any spot of the town you could have a wonderful view of the beautiful mountains on the west side. Moreover, it is said that there are around 300 sunny days per year over there! No wonder why Patrick and Stephan love this place so much and prefer not to stay at Troy all the time, lol~

The first important thing I have learned from the workshop is about data policies. I have to admit that as a student I never cared about data policies in the real world, which in fact always exist and should be obeyed. Regulations have been made about data citations, making scientific data much easier to be accessed and reused. This has really broadened my horizon a lot and gave me a chance to get prepared for the real world challenges. However, there are still much more we need to do about data policies. For example, there are both federal and university scientists working on similar scientific data. Sometimes it is not so easy to make the policies work for these different groups. Coordinating all entities is difficult. Some places are really running with it while others are left behind.

During one break, Prof. Fox reminded me that I should walk around and mingle with people instead of sitting there alone. This was indeed great advice for me. Research sometimes needs great amount of communication. You never know how much others could help with your own work and how much others could also learn from your research achievements. This is also a main purpose for the whole workshop, which is to stimulate academic and agency collaboration in geoinformatics and geodata retrieving, integrating, reusing and citing.

During one lunch break I happened to sit with a guy who conducted time-series hydrologic data visualizations. He used some simple 2-D grids to visualize the data (x-axis as days in a year, y-axis as years, each small grid as one data point in a particular day in the history, different colors as different values). Compared to a whole bunch of plots as well as hard-to-read 3-D visualizations, I was very surprised to see that such a simple idea could reveal much more conclusions from the data. This is just one simple example of how important communication and collaboration are.

Additionally, we still have a long way to go to transfer from the age of relational database to semantic triple store. Nobody could argue that triple store does a much better job than the relational database on the “linked data” aspect. However, it also costs much more on the maintenance side. So in the real world, especially in the industry, people still prefer not to use it since the main goal of business is to make profits. However, I believe that it will change soon in the near future, starting from our domain experts who participated in this workshop.

According to the feedback of a breakout session, a large amount of people are still confused about the different between ontology and vocabulary. Data-related education is another key issue we discussed during the workshop. I felt so lucky that I had the chance to learn data-related knowledge systematically from our tetherless world professors. However, there is still a big challenge to make the whole data community realize how important it is to make data management plans, to carry out data citations, and so on. It is everyone’s responsibility to create an even bigger and better “open data” community.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Conceptual model of a workshop

June 24th, 2014

List of models is following the idea of an ontology spectrum.

Model 1 (via Bruce Caron, easy and impressive): people, coffee, beer + shaking well.

Model 2 (following the context model of 5W1H): date, topic, location, people, agenda, logistics.

Model 3 (things to do – result of a brainstorm):
0 website;
1 date;
2 central topic, purpose, output;
3 topic of sessions, preferred topic of invited talks, topic of panels, topic of breakouts;
4 meeting rooms, hotel, visa application support;
5 organizing committee, meeting chair, session chair, invited speaker, breakout moderator, note taker, technical assistant, workshop report writer;
6 handouts pack (agenda, badge, logistics memo);
7 logistics: announcement, wifi, power strips, emergency contact, projector, whiteboard and marker, remote access facility, alcohol service permission, travel support, travel agency, dietary requirement, morning and afternoon break, lunch, dinner, reception, local transportation, reimbursement method.

Model 4 (following a timeline):
0 Science: topic, purpose;
1 Finance: meeting budget;
2 Planning: meeting proposal, organizing committee, logistics administrator, organizing meetings, date, location, announcement;
3 Agenda: topic of sessions, preferred topic of invited talks, topic of panels, topic of breakouts, meeting rooms, meeting chair, session chair, invited speaker, breakout moderator, note taker, technical assistant;
4 Logistics: handouts pack (agenda, badge, reimbursement form, logistics memo), emergency contact, wifi, power strips, projector, whiteboard and marker, remote access facility, travel support, travel agency, hotel, visa application support, dietary requirement, morning and afternoon break, lunch, dinner, reception, alcohol service permission, local transportation, reimbursement method.
5 Output: online virtual community of attendees, workshop summary and recommendations, workshop report writer.

Model 5 (an ontology? ;-))
Should be something like:
twc:Workshop a prov:Activity.
twc:SessionChair a prov:Role.

Comments and complements are welcome!

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Data Science Day Symposium

June 14th, 2014

It was my first time to attend a Data Science workshop since I began to work for the DCO project. I must say that it was great experience to mingle with scientists and scholars in different domains from all around the world. I felt very proud to see researchers from far away gathering together at our beautiful RPI campus to have conversations about Data Science.

We have done lots of wonderful work to build the Data Science and Management infrastructures for the DCO. Now it is the time to convince our DCO colleagues why it is so important to share “linked data” and “open data” for the whole DCO community.

During the break out session in the afternoon, I joined the EPC group. Dr. Mark Ghiorso led our discussion. Currently a lot of members in DCO haven’t realized how important it is to make a management plan to preserve the data generated in their research. One of the questions which has been raised was that what types and formats of data do you produce or use in your work, and how do you archive them. This question will lead to one of our DCO-DS boundary activities. Since some of the research publications are too old, there are no electronic versions existing at all. If we need to reuse the data from these published literatures, we have to figure out a way to regenerate data from the paper versions.  OCR (optical character recognition) technique could be used to transfer images into machine-readable text after scanning the paper versions. Then one key problem we need to solve is how to extraction the metadata and data we need automatically from the text while maintaining the quality control. We still have a long way to go regarding this.

In the meantime, the fact that we have already lost a lot of valuable data in the history clearly shows why it is so crucial to make a good data management plan before conducting any research activities in the future. The Data Science and Management infrastructures our data science built for DCO plays a very significant role on this, and we also managed to make the portals user-friendly.

During the second day of our workshop, we were very glad to meet a lot of DCO researchers who are willing to share “linked data” via our drupal-VIVO-CKAN Data Science and Management infrastructures. We believe that after this workshop more and more researchers will make use of our data science platforms which could benefit the whole DCO community!

Congrui Li, DCO-DS team member

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags: