Archive

Archive for November, 2010

Supercomputing 2010

November 30th, 2010

In this blog post, I wish to briefly summarize the aspects of the Supercomputing 2010 conference that may be of use to the semantic web community.

1. Technical Papers on Graph Algorithms

The first paper was Fast PGAS Implementation of Distributed Graph Algorithms by Cong, Almasi, and Saraswat. PGAS stands for Partitioned Global Address Space. They claim to “present the first fast PGAS implementation of graph algorithms for the connected components and minimum spanning tree problems” [1]. They performed an evaluation on two random graphs each with 100 million vertices, one with 400 million edges and the other with one billion edges.

The second paper was Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory by Pearce, Gokhale, and Amato. This was the only paper of the three that included as part of its motivation something about the web. They mention “WWW graphs” as an example of real-world, large graphs, and by “WWW graph” they mean webpages as nodes and links between webpages as edges. Also, social networks were mentioned as another example of real-world, large graphs. Unlike the first paper, this paper focuses on a shared-memory paradigm. They “present a novel asynchronous approach to compute Breadth-First-Search (BFS), Single-Source-Shortest-Paths, and Connected Components for large graphs in shared memory” [2]. The “semi-external memory” refers to the use of solid-state memory devices. They evaluated the approach on synthetic, “scale-free graphs” generated using the RMAT [3] graph generator. The graphs had roughly between 33 million and one billion unique edges. They also evaluated Connected Components on WWW graphs ranging from roughly one billion to eight billion edges.

The third paper was Scalable Graph Exploration on Multicore Processors by Agarwal et al. They “investigate the challenges involved in exploring very large graph by designing a breadth-first search (BFS) algorithm for advanced multi-core processors …” [4]. They evaluate on RMAT [3] graphs of up to a billion edges. They make some very interesting claims that in some cases, their BFS algorithm running on a 4-socket Nehalem EX performs on par or even exceeds previously reported performance on the Cray XMT, Cray MTA-2, and Blue Gene/L.

The problem is clear: graph algorithms require random data access (in the sense of spacial locality of the bytes) which inhibits performance. The common solution is to have some abstraction of shared memory (whether it is actually shared memory or not). The focus is on conventional graph algorithms like BFS and connected components, and the state-of-the-art seems to handle on the order of billions of edges.

2. The Graph 500 BOF

The Graph 500 BOF was also of interest, although it was really the Graph 9. The effort was motivated by dissatisfaction that the Top 500 list uses LINPACK as its benchmark, and FLOPS are not the best indication of the usefulness of a system for attacking graph-related problems. For this first version of Graph 500, BFS was the benchmark, and systems were ranked first by size of the input graph and then by edge traversals per second. The list was dominated by Cray machines with a Cray XT4 coming in second place with 232 vertices and 5.22 billion edge traversals per second. However, the one BlueGene machine, a BlueGene/P, took first place achieving both the largest graph of 236 vertices and the highest edge traversal rate of 6.6 billion edge traversals per second.

3. The Semantic Graph/Database Processing BOF

Finally, there was the semantic graph/database processing BOF hosted by David Haglin of PNNL. Cliff Joslyn of PNNL presented their submission to the 2010 Billion Triple Challenge, and David Mizell of Cray, Inc. presented his work in SPARQL querying on the Cray XMT. There was also a presentation by Franz Inc. about AllegroGraph. I presented my relevant existing research as well as gave mention of related works. I also gave a few suggestions to interested HPC researchers, of which I will discuss more in another blog post.

One thing that was clear is that there was diverse interest in semantic web at the BOF. Some were interested in merely utilizing RDF for representing graphs while others were interesting in supporting analysis and/or query of such RDF graphs. I ran into only one person who seemed interested in reasoning. Another comment was about how SPARQL does not directly support graph-like operations or algorithms but serves primarily as a data access language. As an example, how would one find clustered components using SPARQL? It could be done, but perhaps would be cumbersome. The supercomputing community seems to handle RDF data primarily as a graph data structure rather than as sets of triples.

Jesse Weaver
Ph.D. Student, Patroon Fellow
Tetherless World Constellation
Rensselaer Polytechnic Institute

[1] Cong, Almasi, Saraswat. Fast PGAS Implementations of Distribute Graph Algorithms. SC 2010.
[2] Pearce, Gokhale, Amato. Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory. SC 2010.
[3] Chakrabarti, Zhan, Faloutsos. R-MAT: A recursive model for graph mining. ICDM 2004.
[4] Agarwal, Petrini, Pasetto, Bader. Scalable Graph Exploration on Multicore Processors. SC 2010.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

White House Visitors App, now on your iPhone, iPad, or iPod touch

November 9th, 2010

The Linked Open Government Data group has worked hard to make available US Government data as RDF, and over the past year they’ve introduced some really great tools for exploring government data. One of those tools, the White House Visitors app, is now a native app for the iOS product family. The native app provides live querying of the LOGD triple stores, queries DBpedia for more information about important people, uses the New York Times linked open data to find the latest articles, and pulls images using Freebase. For travel, the application caches data so that existing queries can be viewed when Internet connectivity is intermittent.

See the White House Visitors App on iTunes Preview or download it from the App Store.

Screenshots:


White House Visitors app finds the top 25 visitees (configurable) and lists them decreasing order.

Visualize the visitors and visitees in graphical form.

Using linked data, the White House Visitors app can find out more about people listed in the original dataset.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Building a mobile app for ISWC2010

November 9th, 2010

One of the things I find annoying while attending conferences is the huge amount of papers we receive: Talks, maps, “metadata” about the conference in general. So, for ISWC2010, my solution was to create a mobile application where people attending a conference could retrieve the information they wanted.

The first question you may ask when you create a mobile app is which niche you want to cover: The mobile ecosystem contains a wide range of devices, each of them with different capabilities, features, etc. This implies that a developer should choose which platforms to support and which feature he or she can use.

With ISWC in mind, my impression was that most of the attendees would use a smartphone, in particular iPhones or Android. Since I wanted to cover both platforms, I decided not to create native applications (for now).

I based my work on Sencha Touch which is a nice library that uses CSS3, HTML5 and Javascript. The app works fine in iPhone, iPod, iPad, Android devices, as well as Chrome and Safari browsers.

In this app it is possible to obtain and navigate through information about authors, papers, workshops (sadly, I could not obtain the data about workshop papers on time), scheduling, rooms and sessions. The data is obtained from a SPARQL endpoint containing the ISWC 2010 metadata. Each action implies a SPARQL query end the results are retrieved as a JSON object. I also obtained picture of authors from Arnetminer.


Finally I added a twitter feed with all the relevant hahstags (#iswc, #iswc2010, #cold2010, #seres2010, #c3lsw2010, etc.), so you can read what people is tweeting about the latest events (you shouldn’t have problems with firewalls, etc).
I encourage you to go to http://iswc.mobi and try ISWC Mobile. Of course, comments suggestions (and bug reports!) are always welcome.

Alvaro Graves

VN:F [1.9.22_1171]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: iswc, tetherless world, twitter Tags:

Even more things to do with real-time conference data

November 9th, 2010

The widget I introduced in my last post is a specific application of SPARQL-based views over ISWC SemWeb Dog Food data and streaming Twitter data.  However, many other applications are possible, such as dynamic visualisations.  The following are a few examples of raw SPARQL queries executed against the conference triple store.  The first three are the queries currently provided with the widget demo, while the other two focus on data about people.  Tweak the queries to create your own live views of the conference data.

The URL of the AllegroGraph SPARQL endpoint is: http://flux.franz.com/catalogs/demos/repositories/iswc2010.  It supports XML and JSON results through content negotiation.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: iswc, Semantic Web Tags:

Real-time Twitter filters for ISWC

November 8th, 2010

Real-time social networking services such as Twitter provide compelling use cases for Semantic Web technologies.  Last summer at SemTech, I gave a talk with examples of real-time semantic Twitter feeds powered by SPARQL and Twitter Annotations. While Annotations have not yet been released, they’re not the only way to add SemWeb-friendly structure to social data. The International Semantic Web Conference is a case in point. A reasonable combination of the ISWC 2010 Conference Corpus data assembled by Jie Bao and others, tweet metadata such as author and timestamp, and embedded nanosyntax (that is, hashtags), provide enough structure for useful semantic filtering. To illustrate, I present a general-purpose Web widget which grabs these filtered streams from a generic SPARQL endpoint. Of course, the endpoint needs to expose microblog data modeled in SIOC and FOAF. When applied to the ISWC data, the widget provides real-time views of Twitter conversations which make good use of the background knowledge we have about the conference. So now,

Check out the demo!

SPARQL-based Twitter widget


VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Author: Categories: iswc, tetherless world, twitter Tags: