Archive

Author Archive

TWC Undergrads Visualize Linked Open Corporate Data

December 1st, 2011

Two undergraduate members of the Tetherless World team, Alexei Bulazel and Bharath Santosh recently wrote great summaries of their work creating visualizations based on linked open corporate data aggregated through the ORGPedia project. In this post I’ll include snippets from their posts; I encourage you to check out their full posts and the demos they link to!

First, a bit of context (from the ORGPedia site):

ORGPedia: The Open Organizational Data Project, led by NYLS Professor and former United State Deputy CTO Beth Noveck (Project Lead) and TWC Senior Constellation Professor Jim Hendler (Tech Lead) explores how to create the legal, policy and technology framework for a data exchange to facilitate efficient comparison of organizational data across regulatory schemes as well as public reuse and annotation of that data. By designing a universal exchange rather than a new numbering scheme, OrgPedia aims to achieve goals like improving corporate transparency and efficiency, organizational performance, risk management, and data-driven regulatory policy–without having to wait until legislation is enacted for a single, legal entity identifier.

To date, TWC’s contribution to ORGPedia has been to aggregate data from a variety of sources, develop an experimental site to serve as a platform for integrating the data and prototyping ORGPedia concepts, and develop data visualizations and mashups that demonstrate the potential of an open system of canonical identifiers for corporate entities. Led by TWC Ph.D. student Xian Li, undergrads Alexei Bulazel and Bharath Santosh teamed together to create interesting visualizations based on the data aggregated.

Bharath first describes a visualization he created that allows users to analyze various financial properties of the financial sectors in the US using our aggregated data:

The visualization itself is through Google Motion Charts which is in Google’s Visualization API. It is an interactive multidimensional graph of a dataset of sectors and the mean of various financial properties across the sector’s companies. The data shown above is represented is represented in millions USD. The Motion Chart allows for really neat temporal analysis of data in various forms. Clicking the play button shows the change in properties from 2008 to 2011. There are also three different styles you can view the data: bubbles(shown above), bar charts, line graphs. These can be switched in the top corner.

The dataset behind the visualization was created in R. I made a sparql query that would access Orgpedia’s datasets and pull out sector of the US and the companies and their stock tickers within the sectors. Then I took these companies and pulled in their income statements from Google Finance and went through each sector and averaged various properties from the sector’s companies’ financial statements. The data manipulation in R took some getting used to, but now its very easy for me to transform data frames, matrices, and other objects in R. After the dataset was created and cleaned for non-existent values its just defining properties of the Motion Chart and running it. It generates a html file with the graph and data represented in javascript. All the data processing and manipulation takes around 15 minutes mostly due to the large amount of data to be downloaded.

Bharath then goes on to describe the compelling visualization he and Alexei created of the “social network” of corporate board members:

…The visualization utilizes data from LittleSis.org and gathers data about board members of various companies in the US and shows the members in a force graph that shows which board members are on multiple boards (Board Members Network):

The graph visualization is done using the D3 visualization toolkit’s Forced Graph. Each node represents a board member. The clustered colored nodes are a group of members on the same board. The multicolored nodes represent board members that are on multiple boards. Mousing over a node shows you their name and the companies they work for. Clicking a node takes you to their LittleSis.org page. The graph shows many interesting relationships between various companies and board members. Especially Steven S Reinemund who resides on 5 different boards.

On his blog, Alexei provides additional detail about the work they did to prepare the data for the visualization:

The project involved creating an interactive graph visualization of connections between members of corporate boards (the final product can be found here). Given a list of a few hundred stock tickers and access to the LittleSis API, the goal was to ultimately produce a JSON file of board members that could be use by the D3.js force-directed graph framework. I started by looking up each ticker symbol, yielding a JSON file with a unique ID number for each company. My script then queried the API for actual company page associated with that ID and stored the names, company associations, and URIs of each board member. Finally, a JSON file for the D3.js graph was output describing the ~2800 board members and the links between each of them.

While I had used Python a bit for command line scripting, I hadn’t really dug into it before this project. The work gave me a better taste for the language and its capabilities. I made extensive use of the “urllib” library for accessing web content, and worked with opening up the data in JSON files. Bharath helped me with the syntax of program and some of the graph construction. While I was aware of Python’s reputation for ease of use and high level abstraction, working with it let me experience this abstraction first hand, I was very impressed. The ease with which complex multistep operations could be completed let me focus more on the flow of the data through the process rather than the specifics of handling it. The project also gave me a bit more hands on experience with JSON.

The reader is encouraged to read both Alexei‘s and Bharath‘s blogs for more details on these great contributions by a couple of our TWC undergrads!

VN:F [1.9.22_1171]
Rating: 9.7/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)
Author: Categories: tetherless world Tags:

Tetherless World Undergrad Research Program welcomes 10 students for Spring term!

February 11th, 2011

The Tetherless World Constellation (TWC) at RPI welcomes ten students to its Undergraduate Research Program for the Spring 2011 term, its largest group yet!

Beginning with the Fall 2010 term, TWC undergrad researchers have contributed to a variety of projects for credit, pay or experience that touch on TWC’s interest areas, including the Future Web, Xinformatics and Semantic Foundations. Although TWC has enjoyed significant contributions from RPI undergrads since its inception, during the Fall 2010 term TWC began to more formally incorporate undergrads into its research activities, established regular meetings for the group, and outfitted a dedicated undergraduate lab space in RPI’s Winslow Building based on student input.

A critical component of the TWC Undergrad Research program includes engaging students through the many collaboration tools critical to modern Web research. Coordinators Patrick West and John Erickson ask the students to regularly blog about their work throughout the semester; at the end of each term, students post summary descriptions of their work and their thoughts about the fledgling TWC Undergraduate Research Program itself. A summary of the Fall 2010 term may be found on the TWC Weblog

TWC is excited that several of the Fall 2010 students will be continuing their projects or starting new work during the Spring 2011 term. The entire team at the Tetherless World Constellation thanks them for their efforts and many important contributions, and looks forward to being amazed by their continued great work during 2011!

The TWC Undergrad Research Program will have summer opportunities available, and are always accepting undergraduates seeking experience. Interested students, or for more information on the TWC Undergraduate Lab, please visit our TWC Undergraduate Research web page or contact Patrick West or John Erickson.

About TWC@RPI: The Tetherless World Constellation (TWC) at Rensselaer Polytechnic Institute (RPI) explores the research and engineering principles that underlie the Web, to enhance the Web’s reach beyond the desktop and laptop computer, and develops new technologies and languages that expand the capabilities of the Web under three themes: Future Web, Xinformatics and Semantic Foundations. See: http://tw.rpi.edu/web/about

TWC goals include making the next generation web natural to use while being responsive to the growing variety of policy, educational, societal, and scientific needs. Research areas include: web science, privacy, intellectual property, general compliance, Web-based medicaland health systems, semantic escience, data-science, semantic data frameworks, next generation virtual observatories, semantic data and knowledge integration, ontologies, semantic rules and query, semantic applications, data and information visualization, and knowledge provenance, trust and explanation for science.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

Fall 2010 TWC Undergraduate Research Summary

December 20th, 2010

The Fall 2010 semester marked the beginning of the Tetherless World Constellation’s undergraduate research program at Rensselaer Polytechnic Institute (RPI). Although TWC has enjoyed significant contributions from RPI undergrads since its inception, this term we stepped up our game by more “formally” incorporating a group of undergrads into TWC’s research programs, established regular meetings for the group, and with input from the students began outfitting their own space in RPI’s Winslow Building.

Patrick West, my fellow TWC undergrad research coordinator and I asked the students to blog about their work throughout the semester; with the end of term, we asked them to post summary descriptions of their work and their thoughts about the fledgling TWC undergrad research program itself. We’ve provided short summaries and links to those blogs below…

  • Cameron Helm began the term coming up to speed on SPARQL and RDF, experimented with several of the public TWC endpoints, and then worked with Phillip on basic visualizations. He then slashed his way through the tutorials on TWC’s LOGD Portal, eventually creating impressive visualizations such as this earthquake map. Cameron is very interested in the subject of data visualization and looks to do more work in this area in the future.
  • After a short TWC learning period, Dan Souza began helping doctoral candidate Evan Patton create an Android version of the Mobile Wine Agent application, with all the amazing visualization and data integration required, including Twitter and Facebook integration. Mid-semester Dan also responded to the call to help with the crash” development of the Android/iPhone TalkTracker app, in time for ISWC 2010 in early November. Dan continues to work with Evan and others for early 2011 releases of Android, iPhone/iPad Touch and iPad versions of the Mobile Wine Agent.
  • David Molik reports that he learned web coding skills, ontology creation, server installation and administration. David contributed to the development and operation of a test site for the new, semantic web savvy website for the Biological and Chemical Oceanography Data Management Office BCO-DMO of the Woods Hole Oceanographic Institute.
  • Jay Chamberlin spent much of his time working on the OPeNDAP Project, an open source server to distribute scientific data that is stored in various formats. His involvement included everything from learning his way around the OPeNAP server, to working with infrastructure such as TWC’s LDAP services, to helping migrate documentation from the previous Wiki to the new Drupal site, to actually implementing required changes to the OPeNDAP code base.
  • Phillip Ng worked on a wide variety of projects this fall, starting with basic visualizations, helping with ISWC applications, and including iPad development for the Mobile Wine Agent. Phillip’s blog is fascinating to read as he works his way through the challenges of creating applications, including his multi-part series on implementing the social media features.
  • Alexei Bulazel began working with Dominic DiFranzo on a health-related mashup using Data.gov datasets and is now working on a research paper with David on “human flesh search engine” techniques, a topic that top thinkers including Tetherless World Senior Constellation Professor Jim Hendler have explored in recent talks. Note: For more background on this phenomena, see e.g. China’s Cyberposse, NY Times (03 Mar 2010)

Many of these students will be continuing on with these or other projects at TWC in 2011; we also expect several new students to be joining the group. The entire team at the Tetherless World Constellation thanks them for their efforts and many important contributions this fall, and looks forward to being amazed by their continued great work in the coming year!

John S. Erickson, Ph.D.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)