Archive for the ‘Blog’ Category

Get Off Your Twitter

August 25th, 2017

Web Science, more so than many other disciplines of Computer Science, has a special focus on its humanist qualities – no surprise in that the Web is ultimately an instrument for human expression and cooperation. Naturally, lots of current research in Web Science centers on people and their patterns of behavior, making social media a potent source of data for this line of work.


Accordingly, much time has been devoted to analyzing social networks – perhaps to a fault. Much of the ACM’s Web Science ‘17 conference centered on social media; more specifically, Twitter. While it may sound harsh, the reality is that many of the papers presented at WebSci’17 could be reduced to the following pattern:

  1. There’s Lots of Political Polarization
  2. We Want to Explore the Political Landscape
  3. We Scraped Twitter
  4. We Ran (Sentiment Analysis/Mention Extraction/etc.)
  5. and We Found Out Something Interesting About the Political Landscape

Of the 57 submissions included in the WebSci’17 proceedings, 17 mention ‘Twitter’ or ‘tweet’ in the abstract or title; that’s about 3 out of every 10 submissions, including posters. By comparison, only seven mention Facebook, with some submissions mentioning both.


This isn’t to demean the quality or importance of such work; there’s a lot to be gained from using Twitter to understand the current political climate, as well as loosely quantifying cultural dynamics and understanding social networks. However, this isn’t the only topic in Web Science worth exploring, and Twitter certainly shouldn’t be the ultimate arbitrator of that discussion. While Twitter provides a potent means for understanding popular sentiment via a well-controlled dataset, it is still only a single service that attracts a certain type of user and is better for pithy sloganeering than it is for deep critical analysis, or any other form of expression that can’t be captured in 140 characters.


One of my fellow conference-goers also noticed this trend. During a talk on his submission to WebSci’17, Holge Holtzmann, a researcher from Germany working with Web archives, offered a truism that succinctly captures what I’m saying here: that Twitter ought not to be the only data source researchers are using when doing Web Science.


In fact, I would argue that Mr. Holtzmann’s focus, Web archives, could provide a much richer basis for testing our cultural hypotheses. While more old school, Web archives capture a much, much larger and more representative span of the Web from it’s inception to the dawn of social media than Twitter could ever hope to.


The winner for Best Paper speaks directly to the new possibilities offered by working with more diverse datasets. Applying a deep learning approach to Web archives, the authors examined the evolution of front-end Web design over the past two decades. Admittedly, I wasn’t blown away by their results; they claimed that their model had generated new Web pages in the style of different eras, but didn’t show an example, which was underwhelming. But that’s beside the point; the point is that this is a unique task which couldn’t be accomplished by leaning exclusively on Twitter or any other social media platform.


While I remain critical of the hyper-focus of the Web Science community on social media sites – and especially Twitter – as a seed for its work, I do admire the willingness to wade into cultural and other human-centric issues. This is a rare trait in technological disciplines in general, but especially fields of Computer Science; you’re far more likely to read about gains in deep reinforcement learning than you are to read about accommodating cultural differences in Web use (though these don’t necessarily exclude each other). To that point, the need to provide greater accessibility to the Web for disadvantaged groups and to preserve rapidly-disappearing Web content were widely noted, leaving me optimistic for the future of the field as a way of empowering everyone on the Web.


Now time to just wean ourselves off Twitter a bit…

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

DCO-DS participation at Research Data Alliance Plenary 5 meeting

April 30th, 2015

In early March I attended the Research Data Alliance Fifth Plenary and “Adoption Day” event to present our plans for adopting DataTypes and Persistent Identifier Types in the DCO Data Portal. This was the first plenary following the publishing of the data type and persistent identifer type outputs and the RDA community was interested in seeing how early adopters were faring.

At the Adoption Day event I gave a short presentation on our plan for representing DataTypes in the DCO Data Portal knowledge base. Most of the other adopter presentations were limited to organizational requirements or high-level architecture around data types or persistent identifiers – our presentation stood out because we presented details on ‘how’ we intended to implement RDA outputs rather than just ‘why’. I think our attention on technical details was appreciated; from listening to the presentations it did not sound like many other groups were very far into their adoption process.

My main takeaways from the conference were the following:
– we are ahead of the curve on adopting the RDA data type and persistent identifier outputs
– we are viewed as leaders on how to implement data types; people are paying attention to what we are doing
– the chair of the DataType WG was very happy that we were thinking of how data types made sense within the context of our existing infrastructure rather than looking to the WGs reference implementation as the sole way to implement the output
– the DataType WG reference repository is more proof-of-concept then production system
– The data type community is interested in the topic of federating repositories but is not ready to do much on that yet

Overall I think we are well positioned to be a leader on data types. Our work to-date was very well received and many members involved in the DataType WG will be very interested in what more we have to show next September at the Sixth Plenary.

Good work team and let’s keep up the good work!

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: Blog, tetherless world Tags: ,

Open Source Software & Science Reproducibility

January 14th, 2014

This year my contribution to the AGU fall meeting 2013 was all about the development of Open Source Software to enable the reproducibility of scientific products, with both a Poster and an Oral presentation. The AGU was the perfect opportunity to share my ideas on a topic that is one of my main interests.

This was my 2nd time at AGU, but my first time with an oral presentation which turned in a real challenge!

The main issue was a combination of 2 factors : I had decided to generate the slideshow in realtime as HTML from an online IPython Notebook. I thought it would be cool to show this functionality, as well as the work itself. Unfortunately, I was dependent on an internet connection at the time of the presentation, but alas, at AGU the presenter computer doesn’t have internet connection! Definitely not the best conditions for a web based slideshow generated “on-the-fly” by the execution of an IPython Notebook.

I found out about the lack of connectivity only 2 days before my presentation. I must have misunderstood the AGU oral presentation guidelines, but when I didn’t find an explicit mention of the lack of an internet connection, I took it for granted that that wouldn’t be an issue. Big mistake!

I decided it would be safer to prepare a power-point presentation, and some time later, I had one. Deep breath; I would be safe. But… what a disappointment !

I was so excited about the idea of showing my work running in realtime instead of showing a static (somewhat boring) ppt  presentation!!!

I kept thinking about alternative solutions, though, and an idea quickly came to me. If the lack of internet stands in the way of an interactive, realtime demo there should be no problem in running a static HTML slideshows instead; at least that is what I thought …

I used the IPython “nbconvert” utility and its “convert to slide” option, and I successfully converted my workflow from an interactive IPython notebook running in slideshow mode to a static HTML5 slideshows, yeah! The audience wouldn’t get to see how this was done, but at least they would get to see the result.

Happy with the final HTML presentation I finally went to the “AGU’s Speaker Ready Room” to upload and test my presentation. Unfortunately, my HTML presentation would not run offline. The lack of internet was giving me troubles with missing JavaScript files, missing fonts, images-urls to be replaced with path to static files, broken hyperlinks etc … it was not as easy as I thought.

It took more than 3 hours to fix all the bugs on account of a really slow internet connection running from my phone, but finally i got my presentation perfectly  running off line on the AGU computers !

In the end, my talk ran very smoothly. A complete workflow for “catchments characterization” using exclusively open source software, running online and fully reproducible thanks to the use of open source software and an open dataset! I felt really good, as I think I successfully got my message across, both in words and in actions.

To top it all off, my presentation came just at the right time. Before me, two other presentations during my session had mentioned the use of the IPython Notebook as open source software tool to enable reproducibility of scientific work. They had highlighted that it shows great potential and that it deserves further investigation. I think my presentation gave them even more proof of that! Even the chairman acknowledged this when he stated: “Before we heard about it, but now we saw it in action!” I felt very proud of what I had done. The effort I put into running the HTML slideshow definitely paid off!!!


VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

ESIP Winter Meeting 2013

January 24th, 2013

I presented a poster about how can we use Semantic Web Technology to help building an information model and information system for National Climate Assessment (NCA). NCA is a report which integrates, evaluates, and interprets the findings of climate change and impacts on affected industries such as agriculture, natural environment, energy production and use, etc. One of the problem of building an information system for NCA is that NCA uses information from wide range of information sources and covered many climate related topic, this makes it difficult for users to find and identify information they needed. Using Semantic Web Technology, we created a well-structured ontology, where relationship between NCA-realted entities, concepts are well-defined. We also use other Semantic Technologies such as Prov-O, SPARQL-endpoints, and ontology-based facet browsers can to help solve the problem.

Overall, the presentation went well. Few people found particular interest on how we leverages GCMD keywords and Clean Vocabulary when building the information system for NCA, and what benefits can they bring to NCA report information system. Other people also found Facet Search System interesting, especially on applying it on the Geo-related data.

I also attended Semantic Web-related sessions. One of them was discussing how can we use Semantic Web Technology to help solving “Tool Match” problem. Using OWL ontology to encode the rules and concepts about tools and datasets, and then use description logic reasoners to perform Tool Match. Another one was giving tutorial on Semantic Web technology to the ESIP community. It is nice to see Semantic Web Technology really helps different communities to solve various problems and people are becoming more and more interested on this technology.

VN:F [1.9.22_1171]
Rating: 8.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: Blog, Data Science, Semantic Web, tetherless world Tags:

Fall 2010 TWC Undergraduate Research Summary

December 20th, 2010

The Fall 2010 semester marked the beginning of the Tetherless World Constellation’s undergraduate research program at Rensselaer Polytechnic Institute (RPI). Although TWC has enjoyed significant contributions from RPI undergrads since its inception, this term we stepped up our game by more “formally” incorporating a group of undergrads into TWC’s research programs, established regular meetings for the group, and with input from the students began outfitting their own space in RPI’s Winslow Building.

Patrick West, my fellow TWC undergrad research coordinator and I asked the students to blog about their work throughout the semester; with the end of term, we asked them to post summary descriptions of their work and their thoughts about the fledgling TWC undergrad research program itself. We’ve provided short summaries and links to those blogs below…

  • Cameron Helm began the term coming up to speed on SPARQL and RDF, experimented with several of the public TWC endpoints, and then worked with Phillip on basic visualizations. He then slashed his way through the tutorials on TWC’s LOGD Portal, eventually creating impressive visualizations such as this earthquake map. Cameron is very interested in the subject of data visualization and looks to do more work in this area in the future.
  • After a short TWC learning period, Dan Souza began helping doctoral candidate Evan Patton create an Android version of the Mobile Wine Agent application, with all the amazing visualization and data integration required, including Twitter and Facebook integration. Mid-semester Dan also responded to the call to help with the crash” development of the Android/iPhone TalkTracker app, in time for ISWC 2010 in early November. Dan continues to work with Evan and others for early 2011 releases of Android, iPhone/iPad Touch and iPad versions of the Mobile Wine Agent.
  • David Molik reports that he learned web coding skills, ontology creation, server installation and administration. David contributed to the development and operation of a test site for the new, semantic web savvy website for the Biological and Chemical Oceanography Data Management Office BCO-DMO of the Woods Hole Oceanographic Institute.
  • Jay Chamberlin spent much of his time working on the OPeNDAP Project, an open source server to distribute scientific data that is stored in various formats. His involvement included everything from learning his way around the OPeNAP server, to working with infrastructure such as TWC’s LDAP services, to helping migrate documentation from the previous Wiki to the new Drupal site, to actually implementing required changes to the OPeNDAP code base.
  • Phillip Ng worked on a wide variety of projects this fall, starting with basic visualizations, helping with ISWC applications, and including iPad development for the Mobile Wine Agent. Phillip’s blog is fascinating to read as he works his way through the challenges of creating applications, including his multi-part series on implementing the social media features.
  • Alexei Bulazel began working with Dominic DiFranzo on a health-related mashup using datasets and is now working on a research paper with David on “human flesh search engine” techniques, a topic that top thinkers including Tetherless World Senior Constellation Professor Jim Hendler have explored in recent talks. Note: For more background on this phenomena, see e.g. China’s Cyberposse, NY Times (03 Mar 2010)

Many of these students will be continuing on with these or other projects at TWC in 2011; we also expect several new students to be joining the group. The entire team at the Tetherless World Constellation thanks them for their efforts and many important contributions this fall, and looks forward to being amazed by their continued great work in the coming year!

John S. Erickson, Ph.D.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)