Sameas Network is a network of URIs which are inter-connected by owl:sameAs relation. It is such an interesting network as it is not a conventional social network, but rather a socially contributed directed graph DAG connecting “equivalent” identity.
Our recent study  crawls sameas network following linked data principles: starting from a given seeding URI, we dereference the URI and recursively fetch URIs linked by owl:sameAs. We used a fairly small seeding set URIs of New York Times URIs (100 people, 100 locations and 100 organizations) and got 300 sameas networks. Please come to WebSci Poster Session today (April 26,2010) to see more discussions.
The average size of sameas network is 22, and one of the largest networks has 58 URIs in network with 1249 sameas arcs. Not all URIs are dereferencable, and the dereferencable ones may be described by 1 to over a thousand triples.
Following are some interesting breaking observations as confirmed in several plotted sample sameas networks (They are breaking because they have not even been printed in our poster yet).
- New York Times(NYT) and DBpedia have different preferences on mutual sameas relation. It is interesting to see that NYT connect its numerical URI to a non-numeric URI in freebase.
- Many DBpedia URIs were connect not within DBpedia, but by freebase. In DBpedia, “dbpprop:redirect” property was used to connect equivalent URIs.
- Wrong links were introduced by freebase, dbpedia:Paul_Allen was linked to dbpedia:Paul_Allen’s_House.
- A lot of URIs does not carry information or just did redirection (see my paper), so it would be useful to reduce skip these URIs to reduce the cost of linked data exploration. we can further reduce the cost of loading same As URI.
- Quality of sameas link causes a big concern, the legitmate use of freebase sameas realtions is debatable.
Comments from Tim Berner-Lee: let’s leverage semantics – we can look into the semantic annotations (e.g. rdf:type) of the URI being described to automatically infer potential bad data integration. Paul Allen’s House will be than knock out with its type being “house”.
 Ding, Li and Shinavier, Joshua and Finin, Tim and L. McGuinness, Deborah (2010) An Empirical Study of owl:sameAs Use in Linked Data. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26-27th, 2010, Raleigh, NC: US. http://journal.webscience.org/403/
Li Ding @ RPI April 26, 2010