Home > linked data > Sameas Network

Sameas Network

April 26th, 2010

Sameas Network is a network of URIs which are inter-connected by owl:sameAs relation. It is such an interesting network as it is not  a conventional social network, but rather a socially contributed directed graph DAG connecting “equivalent” identity.

Our recent study [1] crawls sameas network following linked data principles: starting from a given seeding URI, we dereference the URI and recursively fetch URIs linked by owl:sameAs. We used a fairly small seeding set URIs of New York Times URIs (100 people, 100 locations and 100 organizations) and got 300 sameas networks.  Please come to WebSci Poster Session today (April 26,2010) to see more discussions.

Results

The average size of sameas network is 22, and one of the largest networks has 58 URIs in network with 1249 sameas arcs. Not all URIs are dereferencable, and the dereferencable ones may be described by 1 to over a thousand triples.

Following are some interesting breaking observations as confirmed in several plotted sample sameas networks (They are breaking because they have not even been printed in our poster yet).

  • New York Times(NYT) and DBpedia have different preferences on mutual sameas relation. It is interesting to see that NYT connect its numerical URI to a non-numeric URI in freebase.
  • Many DBpedia URIs were connect not within DBpedia, but by freebase. In DBpedia, “dbpprop:redirect” property was used to connect equivalent URIs.
  • Wrong links were introduced by freebase, dbpedia:Paul_Allen was linked to dbpedia:Paul_Allen’s_House.

Paul Allen and his House (People NYT)

Paul Allen and his House (People NYT)

Arctic (Location NYT)

Arctic (Location NYT)

Discussion

  • A lot of URIs does not carry information or just did redirection (see my paper), so it would be useful to reduce skip these URIs to reduce the cost of linked data exploration. we can further reduce the cost of loading same As URI.
  • Quality of sameas link causes a big concern, the legitmate use of freebase sameas realtions is debatable.

Comments from Tim Berner-Lee:  let’s leverage semantics – we can look into the semantic annotations (e.g. rdf:type) of the URI being described to automatically infer potential bad data integration. Paul Allen’s House will be than knock out with its type being “house”.

[1] Ding, Li and Shinavier, Joshua and Finin, Tim and L. McGuinness, Deborah (2010) An Empirical Study of owl:sameAs Use in Linked Data. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26-27th, 2010, Raleigh, NC: US. http://journal.webscience.org/403/

Li Ding  @ RPI April 26, 2010

VN:F [1.9.22_1171]
Rating: 8.0/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)
Sameas Network, 8.0 out of 10 based on 2 ratings
Author: Categories: linked data Tags:
  1. Gregory Williams
    April 26th, 2010 at 12:51 | #1

    In what way is this a DAG and not simply a graph? The example graphs you provide aren’t DAGs.

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  2. April 26th, 2010 at 13:07 | #2

    thanks. that is a typo, should be directed graph. there are cycles.

    VN:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  3. Gregory Williams
    April 26th, 2010 at 13:37 | #3

    OK, good. Also, I agree that the user’s *contribution* is directed, but the semantics of the assertion are undirected.

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  4. April 27th, 2010 at 03:25 | #5

    Freebase has certainly found a need for data sanity checking tools like this – http://detypewriter.freebaseapps.com/ is certainly a good example of what a bad-inference tool might look like.

    VA:F [1.9.22_1171]
    Rating: 0.0/5 (0 votes cast)
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  1. February 27th, 2013 at 01:12 | #1