Get Off Your Twitter

August 25th, 2017

Web Science, more so than many other disciplines of Computer Science, has a special focus on its humanist qualities – no surprise in that the Web is ultimately an instrument for human expression and cooperation. Naturally, lots of current research in Web Science centers on people and their patterns of behavior, making social media a potent source of data for this line of work.


Accordingly, much time has been devoted to analyzing social networks – perhaps to a fault. Much of the ACM’s Web Science ‘17 conference centered on social media; more specifically, Twitter. While it may sound harsh, the reality is that many of the papers presented at WebSci’17 could be reduced to the following pattern:

  1. There’s Lots of Political Polarization
  2. We Want to Explore the Political Landscape
  3. We Scraped Twitter
  4. We Ran (Sentiment Analysis/Mention Extraction/etc.)
  5. and We Found Out Something Interesting About the Political Landscape

Of the 57 submissions included in the WebSci’17 proceedings, 17 mention ‘Twitter’ or ‘tweet’ in the abstract or title; that’s about 3 out of every 10 submissions, including posters. By comparison, only seven mention Facebook, with some submissions mentioning both.


This isn’t to demean the quality or importance of such work; there’s a lot to be gained from using Twitter to understand the current political climate, as well as loosely quantifying cultural dynamics and understanding social networks. However, this isn’t the only topic in Web Science worth exploring, and Twitter certainly shouldn’t be the ultimate arbitrator of that discussion. While Twitter provides a potent means for understanding popular sentiment via a well-controlled dataset, it is still only a single service that attracts a certain type of user and is better for pithy sloganeering than it is for deep critical analysis, or any other form of expression that can’t be captured in 140 characters.


One of my fellow conference-goers also noticed this trend. During a talk on his submission to WebSci’17, Holge Holtzmann, a researcher from Germany working with Web archives, offered a truism that succinctly captures what I’m saying here: that Twitter ought not to be the only data source researchers are using when doing Web Science.


In fact, I would argue that Mr. Holtzmann’s focus, Web archives, could provide a much richer basis for testing our cultural hypotheses. While more old school, Web archives capture a much, much larger and more representative span of the Web from it’s inception to the dawn of social media than Twitter could ever hope to.


The winner for Best Paper speaks directly to the new possibilities offered by working with more diverse datasets. Applying a deep learning approach to Web archives, the authors examined the evolution of front-end Web design over the past two decades. Admittedly, I wasn’t blown away by their results; they claimed that their model had generated new Web pages in the style of different eras, but didn’t show an example, which was underwhelming. But that’s beside the point; the point is that this is a unique task which couldn’t be accomplished by leaning exclusively on Twitter or any other social media platform.


While I remain critical of the hyper-focus of the Web Science community on social media sites – and especially Twitter – as a seed for its work, I do admire the willingness to wade into cultural and other human-centric issues. This is a rare trait in technological disciplines in general, but especially fields of Computer Science; you’re far more likely to read about gains in deep reinforcement learning than you are to read about accommodating cultural differences in Web use (though these don’t necessarily exclude each other). To that point, the need to provide greater accessibility to the Web for disadvantaged groups and to preserve rapidly-disappearing Web content were widely noted, leaving me optimistic for the future of the field as a way of empowering everyone on the Web.


Now time to just wean ourselves off Twitter a bit…

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

WebSci ’17

August 14th, 2017

The Web Science Conference was hosted by Rensselaer Polytechnic Institute this year. The Tetherless World Constellation was heavily involved in organizing the event and ensuring the conference ran smoothly.The venue for the conference was the Franklin Plaza in downtown Troy. It was a great venue, with a beautiful rooftop.

On 25th June, there were a set of workshops organized for the attendees. I was a student volunteer at the “Algorithm Mediated Online Information Access (AMOIA)” workshop. We started the day off with a set of talks. The common theme for these talks were to reduce the bias in services we use online. We then spent the next few hours in a discussion on the “Role of recommendation algorithms in online hoaxes and fake news.”

Prof. Peter Fox and Prof Deborah McGuinness, who were the Main Conference Chairs, kicked off the Conference on 26th June. Steffen Staab gave his keynote talk on “The Web We Want“.  After the keynote talk, we jumped right into a series of talks. A few topics caught my attention during each session. Venkata Rama Kiran Garimella’s talk on “The Effect of Collective Attention on Controversial Debates on Social Media” was very interesting, as was the talk on “Recommendations for groups in location-based social networks” by Fred Ayala. We ended the talks with a Panel disscussion on “The ethics of doing Web Science”. After the panel discussions, we headed to the roof for some dinner and the Web Science Poster Session. There were plenty of Posters at the session. Congrui Li and Spencer Norris from TWC presented their work at the poster session.


27th of June was the day of the conference I was most looking forward to, since they had a session on “Networks : Structure, Identifiers, Search”. I found all the talk presented here very fascinating and useful. Particularly the talk “Herirachichal Change Point Detection” and “Adaptive Edge Probing” by Yu Wang and Sucheta Soundarajan respectively. I plan to use the work they presented in one of my current research projects. At the end of the day on 27th June, the award for the papers and posters were presented. Helena Webb won the best paper award. She presented her work on “The ethical challenges of publishing Twitter data for research dissemination”. Venkata Garimella won the best student paper award. Tetherless’ own Spencer Norris won the best poster award.

On 28th June, we started the day of by giving a set of talks on the topic chosen for the Hackthon, “Network Analysis for Non-Social Data”. Here I presented my work on how Network Analysis techniques can be leveraged and applied in the field of Earth Science. After these talk, the hackathon presentations were made by the participants. At lunch , Ahmed Eliesh from TWC won first place in the Hackathon. After lunch, we had the last 2 sessions at WebSci ’17. In these talks, Shawn Jones’ talk present Yasmin Alnomany’s work on “Generating Stories from Archived Collections” and Helena Webb’s best paper winning talk on “The ethical challenges of publishing Twitter data for research dissemination” piqued my interest.

Overall, attending the web science conference was a very valuable experience for me. There was plenty to learn, lots of networking opportunities and a generally jovial atmosphere around the conference. Here’s Looking forward to the next year’s conference in Amsterdam.



VN:F [1.9.22_1171]
Rating: 8.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Do we have a magic flute for K-12 Web Science?

October 27th, 2015

In early July of 2015, Tetherless World Constellation (TWC) opened its door for four young men of the 2015 summer program of Rensselaer Research Experience for High School Students. The program covered a period of four weeks and each student was asked to choose a small and focused topic for research experience. They each were also asked to prepared a poster and present it in public at the end of the program.

Web Science was the discipline chosen by the four high school students at TWC. Before their arrival several professors, research scientists and graduate students formed a mentoring group, and officially I was assigned the task to mentor two of the four students. Such a fresh experience! And then a question came up was: do we have a curriculum of Web Science for High School Students? And for a period of four weeks? We do have excellent textbooks for Semantic Web, Data Science, and more, but most of them are not for high school students. Also the ‘research centric’ feature of the summer program indicated that we should not focus only on teaching but perhaps needed to spend more time on advising a small research project.

My simple plan was, for week 1 we focused on basic concepts, for weeks 2 and 3 the students were assigned a specific topic taken from an existing project, and for week 4 we focused on result analysis, wrap up and poster preparation. A google doc was used to record the basic concepts, technical resources and assignments we introduced and discussed in week 1. I thought those materials could be a little bit more for the students, but to my surprise they took them up really fast, which gave me the confidence to assign them research topics from ongoing projects. One of the students was asked to do statistical analysis of records on the Deep Carbon Observatory Data Portal, and presented the results in interactive visualizations. The other student worked on the visualization of geologic time and connections to Web resources such as Wikipedia. Technologies used were RDF database, SPARQL query, JavaScript, D3.js and JSON data format.

Hope the short program has evoked the students’ interest to explore more and deeper in Web Science. Some of them will soon graduate from high school and go to universities. Wish them good luck!

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Historic launch of the Global Partnership for Sustainable Development Data

September 30th, 2015

An information email in early September from Simon Hodson, the CODATA Executive Director, attracted my deep interest. His email was about the high-level political launch for the Global Partnership for Sustainable Development Data. I was interested because I have worked on Open Data in the past few years and the experience shows that Open Data much more comprehensive than a sole technical issue. I was excited to see that there will be such an event initiated by political partners and focusing on social impacts. And thanks to the support from the CODATA Early Career Data Professionals Working Group, which made it possible for me to head to New York City to attend the forum in person on September 28th.

The forum was held in the Jade Room of the Waldorf Astoria hotel, and lasted for three hours from 2 to 5PM, with a tight but well-organized schedule of about 10 lightning talks, four panels and about 30 commitment introductions from the partners. The panels and lightning talks focused on why open data is needed, how to make data open and, especially, what and the value of open data for The 17 Global Goals for Sustainable Development and the social impact that the data can generate. I was happy to see that the successful stories of open geospatial data were mentioned several times in the lightening talks and the panels. For example, delegates from the World Resources Institute presented the Global Forest Watch-Fires (GFW-Fires), which provides near-real time information from various resources that can enable people to take prompt response before the fire be out of control. During the partner introductions, I heard more exciting news about the actions that the stakeholders in governments, academia, industry and non-profit organizations are going to take actions to support the joint efforts of the Global Partnership for Sustainable Development Data. For example, the Children’s Investment Fund Foundation will invest $20m to improve data on coverage of nutrition interventions and other key indicators by 2020 in several countries; the DigitalGlobe commits to provide three countries with evaluation licenses to their BaseMap service as well as training sessions for human resources; the Planet Labs commits $60 million in geospatial imagery to support the global community; and the William and flora Hewlett Foundation is proposing to commit about $3m to the start-up support of the secretariat for a Global Partnership for Sustainable Development Data. A list of the current partners is accessible on the partnership’s website.

The Global Partnership for Sustainable Development Data has a long-term vision for the year 2030: A world in which everyone is able to engage in solving the world’s greatest problems by (1) Effectively Using Data and (2) Fostering Trust and Accountability in the Sharing of Data. The pioneering partners in this effort have already committed to deliver more than 100 data driven projects worldwide to pave the pathway for the vision 2030. For the first year, the partnership will work together to achieve these goals: (1) Improve the Effective Use of Data, (2) Fill Key Data Gaps, (3) Expand Data Literacy and Capacity, (4) Increase Openness and leverage of Existing Data, and (5) Mobilize Political Will and Resources.

The forum was chaired by Prof. Sanjeev Khagram, with over 200 attendees from various backgrounds. During the reception time after the forum, I had a brief chat with Prof. Khagram about CODATA and also the Early Career Data Professionals Working Group, as well as the potential collaborations. He informed me that the partnership is open and invites broad participation to address the sustainable development goals. Prof. Khagram also mentioned that a bigger event, the World Data Forum, will take place in 2016. I also had the opportunity to catch up with Dr. Bob Chen from CIESIN, Columbia University about recent activities. It seems that ‘climate change’ is the topic of focus for several conferences in the year 2015, such as the International Scientific Conference, the Research Data Alliance Sixth Plenary Meeting and the United Nations Climate Change Conference, and Paris is the city for all these three events.

The report A World That Counts: Mobilising The Data Revolution for Sustainable Development, prepared by the United Nation Secretary-General’s Independent Expert Advisory Group on a Data Revolution for Sustainable Development, provides more background information about the Global Partnership for Sustainable Development Data.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Data and Semantics — Topics of Interest at ESIP 2015 Summer Meeting

July 27th, 2015

The ESIP 2015 Summer Meeting was held at Pacific Grove, CA in the week of July 14-17. Pacific Grove is such a beautiful place with the coast line, sand beach and sun set. What excited me more are the science and technical topics covered in the meeting sessions, as well as the opportunity to catch up with friends in the ESIP community. Excellent topics + a scenic place + friends = a wonderful meeting. Thanks a lot to the meeting organizers!

The theme of this summer meeting is “The Federation of Earth Science Information Partners & Community Resilience: Coming Together.” Though my focus was Semantic Web and data stewardship relevant sessions, I was able to see the topic ‘resilience’ in various presented works. It was nice to see that the ESIP community has an ontology portal. It implements the Bio Portal infrastructure and focuses on collecting ontologies and vocabularies in the field of Earth sciences. With more submissions from the community in the future the portal has great potential for geo-semantics research, similar to what the Bio Portal does for bioinformatics. An important topic was reviewing progress and discussing directions for the future. Prof. Peter Fox from RPI offered a short overview. The ESIP Semantic Web cluster is nine years old, and it is nice to see that through the cluster has helped improve the visibility of semantic web methods and technologies in the grand field of geoinformatics. A key feature supporting the success of Semantic Web is that it is an open world and it evolves and updates.

There were several topics or projects of interest that I recorded during the meeting:

(1) It recently released version 2.0 and introduced a new mechanism for extension. There are now two types of extensions: reviewed/hosted extensions and external extensions. The former (e1) gets its own chunk of namespace: All items in that extension are created and maintained by their own creators. The latter means a third party to create extensions specific to an application. Extensions to location and time might be a topic for the Earth science community in the near future.

(2) GCIS Ontology: GCIS is such a nice project it is incorporated several state-of-the-art Semantic Web methods and technologies. The provenance representation in GCIS means it is not just a static knowledge representation. It is more about what are the facts, what do people believe and why. In the ontology engineering for GCIS we also see the collaboration between geoscientists and computer scientists. That is, conceptual model came first, as a product that geoscientists can understand, before it was bound to logic and ontology encoding grammar. The process can be seen as within the scope of semiology. We can do good jobs with syntax and semantics, and very often we will struggle with the pragmatics.

(3) PROV-ES: Provenance of scientific findings is receiving increasing attending. Earth science community has taken a lead on working of capturing provenance. The World Wide Web Consortium (W3C) PROV standard provide a platform for Earth science community to adopt and extend. The Provenance – Earth Science (PROV-ES) Working Group was initiated in 2013 and it primarily focused on extending the PROV standard, and tested the outputs with sample projects. In the PROV-ES hackathon at the summer meeting, Hook Hua and Gerald Manipon showed more technical details of with PROV-ES, especially about its encodings, discovery, and visualization.

(4) Entity linking: Jin Guang Zheng and I had a poster about our ESIP 2014 Test bed project. The topic is about linking entity mentions in documents and datasets to entities in the Web of Data. Entity recognition and linking is a valuable work in works with datasets collected from multiple sources. Detecting and linking entity mentions in datasets can be facilitated by using knowledge bases on the Web, such as ontologies and vocabularies. In this work we built a web-based entity linking and wikification service for datasets. Our current demo system uses DBPedia as the knowledge base, and we have been collecting geoscience ontologies and vocabularies. A potential future collaboration is to use the ESIP ontology portal as the knowledge base. Discussion with colleagues during the poster session shows that this work may also be beneficial to works on dark data, such as pattern recognition and knowledge discovery from legacy literature.

(5) Big Earth Data Initiative: This is an inter-agency coordination work for geo-data interoperability in US. I would copy paste a part of the original session description to show the detailed relationships about a few entities and organizations that were mentioned: ‘The US Group on Earth Observations (USGEO) Data Management Working Group (DMWG) is an inter-agency body established under the auspices of the White House National Science and Technology Council (NSTC). DMWG members have been drafting an “Earth Observations Common Framework” (EOCF) with recommended approaches for supporting and improving discoverability, accessibility, and usability for federally held earth observation data. The recommendations will guide work done under the Big Earth Data Initiative (BEDI), which provided funding to some agencies for improving those data attributes.’ It will be nice to see more outputs from this effort and compare the work with similar efforts in Europe such as the INSPIRE, as well as the global initiative GEOSS.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags: