Archive

Archive for the ‘tetherless world’ Category

TWC at AGU FM 2018

January 22nd, 2019

In 2018, AGU celebrated its centennial year. TWC had a good showing at this AGU, with 8 members attending and presenting on a number of projects.

We arrived at DC on Saturday night, to attend the DCO Virtual Reality workshop organized by Louis Kellogg and the DCO Engagement Team, where research from greater DCO community came together to present, discuss and understand how the use of VR can facilitate and improve both research and teaching. Oliver Kreylos and Louis Kellogg spent various session presenting the results of DCO VR project, which involved recreating some of the visualizations used commonly at TWC, i.e the mineral networks. For a preview of using the VR environment, check out these three tweets. Visualizing mineral networks in a VR environment has yielded some promising results, we observed interesting patterns in the networks which need to be explored and validated in the near future.

With a successful pre-AGU workshop behind us, we geared up for the main event. First thing Monday morning, was the “Predictive Analytics” poster session, which Shaunna Morrison, Fang Huang, and Marshall Ma helped me convene. The session, while low on abstracts submitted, was full of very interesting applications of analytics methods in various earth and space science domains.

Fang Huang also co-convened a VGP session on Tuesday, titled “Data Science and Geochemistry“. It was a very popular session, with 38 abstracts. Very encouraging to see divisions other than ESSI have Data Science sessions. This session also highlighted the work of many of TWC’s collaborators from the DTDI project. Kathy Fontaine convened a e-lightning session on Data policy. This new format was very successfully in drawing a large crowd to the event and enabled a great discussion on the topic. The day ended with Fang’s talk, presenting our findings about the network analysis of samples from the cerro negro volcano.

Over the next 2 days, many of TWC’s collaborators presented, but no one from TWC presented until Friday. Friday though was the busiest day for all of us from TWC. Starting with Peter Fox’s talk in the morning, Mark Parsons, Ahmed Eleish, Kathy Fontaine and Brenda Thomson all presented their work during the day. Oh yeah…and I presented too! My poster on the creation of the “Global Earth Mineral Inventory” got good feedback. Last, but definitely not the least, Peter represented the ESSI division during the AGU centennial plenary, where he talked about the future of Big Data and Artificial Intelligence in the Earth Sciences. The video of the entire plenary can be found here.

Overall, AGU18 was great, other than the talk mentioned above, multiple productive meetings and potential collaboration emerged from meeting various scientists and talking to them about their work. It was an incredible learning experience for me and the other students (for whom this was the first AGU).

As for other posters and talks I found interesting. I tweeted a lot about them during AGU. Fortunately, I did make a list of some interesting posters.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

WebSci ’17 Tutorial Note– Analyzing Geolocated Data with Twitter

September 22nd, 2017

Speaker:

Prof. Bruno Gonçalves, New York University

(http://www.bgoncalves.com/)

Schedule

09:00 -10:20 theory session

10:30 -12:00 practical session

Theory Session:

GPS-enabled smartphone: provides precise geographic locations

Jan,17 global digital snapshot

Social MEowDia Explained- different behaviors on different social media

Twitter:

Anatomy of a tweet: short (start as a message system), hashtag, how many times shared, timestamp, location (comes from your GPS system), background info—metadata,

Metadata:

Text-content, User, Geo, URL, etc.

Geolocated Tweets:

Follows a user’s geo info over time

GPS Coordinates vs World Population

Smartphone ownership—highest among adults, higher education/ income levels (results from survey)

Market Penetration: larger user group in higher GDP countries

Age Distribution

Demographics: ICWSM’11 375(2011)

Language and Geography: different languages show different distributions among geographic location, for example, Spanish and English distributions in NYC

Multilayer Network:

Retweet- information layers

|

Mention

|

Follower- social layers

Link Function–ICWSM’ 11, 89 (2011)

Cluster—retweets ~= agreement; mention ~= discussion

Retweets and mention have very different meanings

The Strength of Ties: chains of ties

Interviews to find out how individuals found out about opportunities

Mostly from acquaintance or friend of friends

It argued that the degree of overlap of two individual’s social networks varies directly with the strength of their tie to one another.

Neighborhood Overlap

Network Structures: arrows-retweets; cluster-different friendship communities; dots- users; people/user serves as a bridge between communities.

Links: internal, between groups, intermediary, etc.

Groups

Geography

Retweet- information layers

|

Mention

|

 Follower- social layers

|

Geographic location

Twitter follower distance

Locality: measures percentage of a user’s friend who lives in the same country.

Co-occurrences and social ties

Geotagged Flickr Photos

Divide the world into a grid, count number of cells on which two individuals were within a given interval

Measures: share photo within a period of time in the same grid – likelihood of becoming friends

Mobility: school/work—home—vacation—move to different city/country

Airline Flights: in Europe within 24h

Commuting: train, subway, bus, etc.

Realistic Epidemic Spreading

Human Mobility: Statistical Model

Privacy (Sci Rep 3, 1376(2013))

How many indicators we need to identify a unique person.

Mobility and Social Network (PLoS One 9, E92196 (2014))

Geo-Social Properties- Matrix of social behavior over distance: Probability of a link, reciprocity, Clustering, Triangle disparity

Geo-Social Model:

Starting position of user u

Visit a random neighbor                    jump to a new location

New position of u

Model fitting: probability of visiting old friend vs meeting new friend

Human Diffusion: how people are moving around on map (J.R.Sco. Interface 12, 20150473 (2015))

Residents and Tourists

City Communities

Practical Session:

https://github.com/bmtgoncalves/WebSci17

Environment Requirement: anaconda & python

Registering an Application

API basics

The Twitter module provides the OAuth interface, we just need to provide the right credentials.

Best to keep the credentials in a dict and parametrize our calls with the dict keyswitch accounts.

.Twitter(auth) takes an OAuth instance as an argument and returns a Twitter object.

Authenticating with the API

In the remainder of this course, the accounts dict will live inside the twitter_accounts.py file.

4 basic types of objects: tweets, users, entities, places.

Searching for Tweets

.search.tweets(query, count)  https://dev.twitter.com/docs/api/1.1/get/search/tweets

  • query is the content to search for
  • count is the maximum number of results to return (from most recent tweets)

returns dict with a list of ‘statuses’

Social Connections

.friends.ids() and .followers.ids() returns a list of up to 500 of a user’s friends or followers for a given screen_name or user_id.

Results is a dict containing multi-fields.

User Timeline

.statuses.user_timeline() returns a set of tweets posted by a single user.

Important options:
include_rst = ‘true’ to include retweet

Count = 200 is max # of tweets to return in each call

Trim_user = ‘true’ to not include the user information

Max_id = 1234 to include only tweets with an id lower than 1234

Return at most 200 tweets in each call, can get all of a user’s tweets up to 3200 with multiple calls

Social Interaction

Data processing extended from user timeline

NetworkX–networkx_demo.py

High productive software for complex network

Come with anaconda

Simple python interface

Four different types of graphs

  • Graph—undirected graph
  • DiGraph—directed graph
  • MultiGraph—multi-edged graph
  • MultiDiGraph—multi-edged directed graph

Similar interface for all graphs

Nodes can be any type of python object

Growing graph—add nodes, edges, etc.

Graph Properties

  • .nodes() return a list nodes
  • .edges()
  • .degree() return a dict with each node degree .in_degree()/ .out_degree() for DiGraph
  • .is_connected()
  • .is_weakly/strongly_connected()
  • .connected_components()

Snowball Sampling–snowball.py

Commonly used in Social Science and Computer Science

  • Start with a single node
  • Get friends list
  • For each friend get the friend list
  • Repeat for a fixed number of layers or until enough

Generates a connected component graph

Streaming Geocoded data–twitter_location.py

The streaming api provides real time data, subject to filter

Use TwitterStream instead of Twitter object

  • .status.filter(track = 1) while return tweets that matches the query q in real time
  • return generator that you can iterate over
  • .status.filter(locations = bb) will return tweets that occur within the bounding box bb in real time

bb is a comma separated pair of lon/lat coordinates.

Shapefiles

Open specification developed by ESRI, still the current leader in the commercial GIS software

Shapefiles aren’t actual files

But actually a set of files sharing the same name but with different extensions.

The actual set of files changes depending on the contents, but 3 files are usually present:

  • .shp—also commonly referred to as the shapefile contains geometric info
  • .dbf—a simple database containing the feature attribute table
  • .shx—a spatial index

QGIS

Pyshp–hapefile_load.py

Pyshp defines utility functions to load and manipulate shapefiles programmatically.

The shapefile module handles the most common operations:

  • .reader(filename) return a reader object
  • reader.records()/iterRecords()
  • reader.shapes()/iterShapes()
  • reader.shapeRecords()/iterShapeRecords()

shape objects contain several fields:
bbox lower left and upper right x,y coordinates (long/lat)

Simple shapefile plot–plot_shapefile.py

Shapely–shapefile_shape_properties.py

Shaplely defines geometric object under shapely.geometry

                   Points, polygon, multip-polygon, shapes()

And common operations

                   .crosses, .contains, etc..

shape object provides useful field to query a shapes properties:

                    .centroid, .area, .bounds, etc..

Filter Points with a shapefile–shapefile_filter.py

Twitter Places–shapefile_filter_places.py

Twitter defines a “coordinates” filed in tweets

There is also a place field that we glossed over

The place object contains also geographic info, but at a courser resolution than the coordinated filed

Each place has a unique place_id, a bouding_box and some geographical information such as country and full_name.

Places can be of several different types: admin, city, neighborhood, poi

Place Attributes: Key, street_address, phone, post_code, region, ios3, twitter, URL, App:id, etc.

Filter points and places–plot_shapefile_points.py   

Aggregation–shapefile_filter_aggregate.py

VN:F [1.9.22_1171]
Rating: 7.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Get Off Your Twitter

August 25th, 2017

Web Science, more so than many other disciplines of Computer Science, has a special focus on its humanist qualities – no surprise in that the Web is ultimately an instrument for human expression and cooperation. Naturally, lots of current research in Web Science centers on people and their patterns of behavior, making social media a potent source of data for this line of work.

 

Accordingly, much time has been devoted to analyzing social networks – perhaps to a fault. Much of the ACM’s Web Science ‘17 conference centered on social media; more specifically, Twitter. While it may sound harsh, the reality is that many of the papers presented at WebSci’17 could be reduced to the following pattern:

  1. There’s Lots of Political Polarization
  2. We Want to Explore the Political Landscape
  3. We Scraped Twitter
  4. We Ran (Sentiment Analysis/Mention Extraction/etc.)
  5. and We Found Out Something Interesting About the Political Landscape

Of the 57 submissions included in the WebSci’17 proceedings, 17 mention ‘Twitter’ or ‘tweet’ in the abstract or title; that’s about 3 out of every 10 submissions, including posters. By comparison, only seven mention Facebook, with some submissions mentioning both.

 

This isn’t to demean the quality or importance of such work; there’s a lot to be gained from using Twitter to understand the current political climate, as well as loosely quantifying cultural dynamics and understanding social networks. However, this isn’t the only topic in Web Science worth exploring, and Twitter certainly shouldn’t be the ultimate arbitrator of that discussion. While Twitter provides a potent means for understanding popular sentiment via a well-controlled dataset, it is still only a single service that attracts a certain type of user and is better for pithy sloganeering than it is for deep critical analysis, or any other form of expression that can’t be captured in 140 characters.

 

One of my fellow conference-goers also noticed this trend. During a talk on his submission to WebSci’17, Holge Holtzmann, a researcher from Germany working with Web archives, offered a truism that succinctly captures what I’m saying here: that Twitter ought not to be the only data source researchers are using when doing Web Science.

 

In fact, I would argue that Mr. Holtzmann’s focus, Web archives, could provide a much richer basis for testing our cultural hypotheses. While more old school, Web archives capture a much, much larger and more representative span of the Web from it’s inception to the dawn of social media than Twitter could ever hope to.

 

The winner for Best Paper speaks directly to the new possibilities offered by working with more diverse datasets. Applying a deep learning approach to Web archives, the authors examined the evolution of front-end Web design over the past two decades. Admittedly, I wasn’t blown away by their results; they claimed that their model had generated new Web pages in the style of different eras, but didn’t show an example, which was underwhelming. But that’s beside the point; the point is that this is a unique task which couldn’t be accomplished by leaning exclusively on Twitter or any other social media platform.

 

While I remain critical of the hyper-focus of the Web Science community on social media sites – and especially Twitter – as a seed for its work, I do admire the willingness to wade into cultural and other human-centric issues. This is a rare trait in technological disciplines in general, but especially fields of Computer Science; you’re far more likely to read about gains in deep reinforcement learning than you are to read about accommodating cultural differences in Web use (though these don’t necessarily exclude each other). To that point, the need to provide greater accessibility to the Web for disadvantaged groups and to preserve rapidly-disappearing Web content were widely noted, leaving me optimistic for the future of the field as a way of empowering everyone on the Web.

 

Now time to just wean ourselves off Twitter a bit…

VN:F [1.9.22_1171]
Rating: 9.5/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

WebSci ’17

August 14th, 2017

The Web Science Conference was hosted by Rensselaer Polytechnic Institute this year. The Tetherless World Constellation was heavily involved in organizing the event and ensuring the conference ran smoothly.The venue for the conference was the Franklin Plaza in downtown Troy. It was a great venue, with a beautiful rooftop.

On 25th June, there were a set of workshops organized for the attendees. I was a student volunteer at the “Algorithm Mediated Online Information Access (AMOIA)” workshop. We started the day off with a set of talks. The common theme for these talks were to reduce the bias in services we use online. We then spent the next few hours in a discussion on the “Role of recommendation algorithms in online hoaxes and fake news.”

Prof. Peter Fox and Prof Deborah McGuinness, who were the Main Conference Chairs, kicked off the Conference on 26th June. Steffen Staab gave his keynote talk on “The Web We Want“.  After the keynote talk, we jumped right into a series of talks. A few topics caught my attention during each session. Venkata Rama Kiran Garimella’s talk on “The Effect of Collective Attention on Controversial Debates on Social Media” was very interesting, as was the talk on “Recommendations for groups in location-based social networks” by Fred Ayala. We ended the talks with a Panel disscussion on “The ethics of doing Web Science”. After the panel discussions, we headed to the roof for some dinner and the Web Science Poster Session. There were plenty of Posters at the session. Congrui Li and Spencer Norris from TWC presented their work at the poster session.

 

27th of June was the day of the conference I was most looking forward to, since they had a session on “Networks : Structure, Identifiers, Search”. I found all the talk presented here very fascinating and useful. Particularly the talk “Herirachichal Change Point Detection” and “Adaptive Edge Probing” by Yu Wang and Sucheta Soundarajan respectively. I plan to use the work they presented in one of my current research projects. At the end of the day on 27th June, the award for the papers and posters were presented. Helena Webb won the best paper award. She presented her work on “The ethical challenges of publishing Twitter data for research dissemination”. Venkata Garimella won the best student paper award. Tetherless’ own Spencer Norris won the best poster award.

On 28th June, we started the day of by giving a set of talks on the topic chosen for the Hackthon, “Network Analysis for Non-Social Data”. Here I presented my work on how Network Analysis techniques can be leveraged and applied in the field of Earth Science. After these talk, the hackathon presentations were made by the participants. At lunch , Ahmed Eliesh from TWC won first place in the Hackathon. After lunch, we had the last 2 sessions at WebSci ’17. In these talks, Shawn Jones’ talk present Yasmin Alnomany’s work on “Generating Stories from Archived Collections” and Helena Webb’s best paper winning talk on “The ethical challenges of publishing Twitter data for research dissemination” piqued my interest.

Overall, attending the web science conference was a very valuable experience for me. There was plenty to learn, lots of networking opportunities and a generally jovial atmosphere around the conference. Here’s Looking forward to the next year’s conference in Amsterdam.

 

 

VN:F [1.9.22_1171]
Rating: 9.3/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Do we have a magic flute for K-12 Web Science?

October 27th, 2015

In early July of 2015, Tetherless World Constellation (TWC) opened its door for four young men of the 2015 summer program of Rensselaer Research Experience for High School Students. The program covered a period of four weeks and each student was asked to choose a small and focused topic for research experience. They each were also asked to prepared a poster and present it in public at the end of the program.

Web Science was the discipline chosen by the four high school students at TWC. Before their arrival several professors, research scientists and graduate students formed a mentoring group, and officially I was assigned the task to mentor two of the four students. Such a fresh experience! And then a question came up was: do we have a curriculum of Web Science for High School Students? And for a period of four weeks? We do have excellent textbooks for Semantic Web, Data Science, and more, but most of them are not for high school students. Also the ‘research centric’ feature of the summer program indicated that we should not focus only on teaching but perhaps needed to spend more time on advising a small research project.

My simple plan was, for week 1 we focused on basic concepts, for weeks 2 and 3 the students were assigned a specific topic taken from an existing project, and for week 4 we focused on result analysis, wrap up and poster preparation. A google doc was used to record the basic concepts, technical resources and assignments we introduced and discussed in week 1. I thought those materials could be a little bit more for the students, but to my surprise they took them up really fast, which gave me the confidence to assign them research topics from ongoing projects. One of the students was asked to do statistical analysis of records on the Deep Carbon Observatory Data Portal, and presented the results in interactive visualizations. The other student worked on the visualization of geologic time and connections to Web resources such as Wikipedia. Technologies used were RDF database, SPARQL query, JavaScript, D3.js and JSON data format.

Hope the short program has evoked the students’ interest to explore more and deeper in Web Science. Some of them will soon graduate from high school and go to universities. Wish them good luck!

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)