Archive

Archive for the ‘Semantic Web’ Category

Sameas Network

April 26th, 2010

Sameas Network is a network of URIs which are inter-connected by owl:sameAs relation. It is such an interesting network as it is not  a conventional social network, but rather a socially contributed directed graph DAG connecting “equivalent” identity.

Our recent study [1] crawls sameas network following linked data principles: starting from a given seeding URI, we dereference the URI and recursively fetch URIs linked by owl:sameAs. We used a fairly small seeding set URIs of New York Times URIs (100 people, 100 locations and 100 organizations) and got 300 sameas networks.  Please come to WebSci Poster Session today (April 26,2010) to see more discussions.

Results

The average size of sameas network is 22, and one of the largest networks has 58 URIs in network with 1249 sameas arcs. Not all URIs are dereferencable, and the dereferencable ones may be described by 1 to over a thousand triples.

Following are some interesting breaking observations as confirmed in several plotted sample sameas networks (They are breaking because they have not even been printed in our poster yet).

  • New York Times(NYT) and DBpedia have different preferences on mutual sameas relation. It is interesting to see that NYT connect its numerical URI to a non-numeric URI in freebase.
  • Many DBpedia URIs were connect not within DBpedia, but by freebase. In DBpedia, “dbpprop:redirect” property was used to connect equivalent URIs.
  • Wrong links were introduced by freebase, dbpedia:Paul_Allen was linked to dbpedia:Paul_Allen’s_House.

Paul Allen and his House (People NYT)

Paul Allen and his House (People NYT)

Arctic (Location NYT)

Arctic (Location NYT)

Discussion

  • A lot of URIs does not carry information or just did redirection (see my paper), so it would be useful to reduce skip these URIs to reduce the cost of linked data exploration. we can further reduce the cost of loading same As URI.
  • Quality of sameas link causes a big concern, the legitmate use of freebase sameas realtions is debatable.

Comments from Tim Berner-Lee:  let’s leverage semantics – we can look into the semantic annotations (e.g. rdf:type) of the URI being described to automatically infer potential bad data integration. Paul Allen’s House will be than knock out with its type being “house”.

[1] Ding, Li and Shinavier, Joshua and Finin, Tim and L. McGuinness, Deborah (2010) An Empirical Study of owl:sameAs Use in Linked Data. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26-27th, 2010, Raleigh, NC: US. http://journal.webscience.org/403/

Li Ding  @ RPI April 26, 2010

VN:F [1.9.17_1161]
Rating: 8.0/10 (2 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: linked data Tags:

Big Data for the Cloud and the Crowd

April 1st, 2010

Researchers have been long starving for big data to improve the excellence of their research. Nowadays big data is no longer a dream but something real on the Web: increasing amount of data is becoming available for public access from research communities, individuals, government agencies and etc. So what does such big data mean to the web users and how can we best use it? Following are some potential benefits from big data.

“Make sense of what have been known”. Scientific research is growing in a progressive manner, and scientific discoveries are founded on the knowledge we known in the past. In order to avoid reinventing the wheel, we should preserve our knowledge on what we have known as part of big data and make them available to ongoing research. Currently, keyword search, such as Google Scholar, has successfully helped researchers to retrieve previous research work. Moreover, well organized knowledge about the past research is wanted to provide users a systematic and accurate way to access past work. With better knowledge on what has been done, user can better identifying promising research directions and approaching new discoveries.

“Support hypothesis generation and testing”. With big data in hand (or public accessible), not only scientists but the general public users can start thinking more on the hypothesis, including theoretical models and pop-science questions. A humble use of big data would be that users use an interactive application to conveniently aggregate distributed big data and then invent or evaluate their hypotheses on big data. On step forward would be the usage of powerful AI technology (especially statistical methods) on big data to help users identify similar/unique data/hypotheses, prioritize potentially interesting candidate hypotheses and even come up with new hypothesis.

“Support persistence and accountability”. If big data are going to be the foundation for massive scientific research and public use, reliable data availability is needed by all applications that depend on the data. Meanwhile, without effective accountability mechanisms over the distributed and shared big data, conclusions derived from the big data may not be trusted.

In order to realize the benefits, the emerging Web Science seems very promising as it is bringing many interesting opportunities to deal with the big data:

“Linked Data” [1]. Big data is not merely a massive collection of information islands bounded by their physical locations, and the value of big data can be greatly increased if there are effectively linked (or networked). Similar to the hyperlinks on the Web, it is very important to turn implicit inter-data connections into declarative ones and get links available as part of big data: a person’s medical records can be linked across different clinics and hospitals, demographic state statistics (e.g. livestock and gross income tax) can be linked across different government agencies [2], and information about a disease can be linked to entries at GenBank.

“Social Machine” [3]. Big data should also interact with human society. Crowd sourcing, such as Wikipedia and Web rating systems, has been seen adding huge value to the knowledge on the Web. However, that is not yet the ultimate vision. We can indeed combine the power of machine and human to build the social machine: cloud computing, such as Google search and Microsoft recently announced Web n-gram service, are offering great computing power for processing massive data, and crowd sourcing, such as Wikipedia, can distribute the cost for solving hard problems to massive human intelligence on the Web and supply high quality results. The social machine also supports interactive problem solving: there is a feedback loop between the cloud and the crowd, and the consumers can feedback comments and enhancements to the publisher.

“Knowledge Provenance”[4,5]. Big data are often integrated when being used. Declarative knowledge provenance (e.g. audit trace) is the foundation of transparency of distributed data processing. Computations on provenance data are the keys to accountability, e.g. a policy framework to assure proper use of digital information and some trust mechanisms to assure credibility of reused data.

References

[1] Tim Berners-Lee, Linked Data, 2007 http://www.w3.org/DesignIssues/LinkedData.html

[2] Li Ding, Dominic Difranzo, Alvaro Graves, James Michaelis, Xian Li, Deborah L. McGuinness,Jim Hendler, Data-gov Wiki: Towards Linking Government Data, in Proceedings of the AAAI Spring Symposium on Linked Data Meets Artificial Intelligence, 2010, http://data-gov.tw.rpi.edu/2010/linkedai-2010-datagov.pdf

[3] J. Hendler, T. Berners-Lee, From the semantic web to social machines: A research challenge for AI on the World Wide Web, Artificial Intelligence (2009), http://dx.doi.org/10.1016/j.artint.2009.11.010

[4] Deborah L. McGuinness and Li Ding and Paulo Pinheiro da silva and Cynthia Chang. PML 2: A Modular Explanation Interlingua. in Proceedings of the AAAI’07 Workshop on Explanation-Aware Computing, 2007, http://www.ksl.stanford.edu/KSL_Abstracts/KSL-07-07.html

[5] Li Ding, Provenance and Search Issues in RDF Data Warehouse, in Proceedings of SemGrail Workshop, 2007, http://research.microsoft.com/en-us/events/semgrail2007/lid_position.pdf

Li Ding,  April 1, 2010

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: linked data, Web Science Tags:

Semantic Web for the Working Ontologist: Japanese Preface

January 19th, 2010

Dean and I were very pleased to learn that “Semantic Web for the Working Ontologist” is being published in Japanese.  We were asked to write a preface for the Japanese version — since it will only appear in print in Japanese, I thought I’d share it here in English (pretranslation):

We are very pleased to be able to write this new Preface introducing the Japanese translation of our book. Japanese researchers have been involved in Semantic Web technologies since the very early days, and we are honored that our book has been chosen for translation and republication to make it more accessible to the Japanese audience.

In the less than two years since this book was published, we have seen a large growth of interest in the Semantic Web and the new Web applications it makes available. This includes the commercial interest in new enterprise solutions, in new ways to bring data to the Web, and in the large-scale “Web 3.0” applications that can be enabled by combining Semantic Web data with other Web applications. New terms such as “semantic search,” “intelligent match,” and “virtual personal assistant” are starting to make it out of the laboratories and into the world of Web startups. Turning the mass of data available through the Web into useful knowledge increasingly demands new techniques and new technologies to succeed, and the Semantic Web is becoming more recognized as an important player in the growing Web world.

One of the reasons for the increasing interest in these technologies is the lack of success of that “folksonomies” and Web 2.0 approaches have had in stemming the growing tide of Web information. In fact, just the opposite – new media such as blogs, social networks and twitter™ have led to people spending more and more time on the Web, but with less and less ability to find specific things they need. Without semantics, the Web is turning into a wonderful wonderland for entertainment, but less and less a productive space for solving the real problems being faced by people, companies and governments in today’s increasingly complex world.

As this interest has grown, it has also been becoming clear that critical to the successful application of these technologies is an ability to model at some level. To get a first demo up and running is not hard, but just as a real application of a data base must include a data model, so must a real application using semantic technologies include a model of the information of interest – an ontology. In this book, we provide you with the background necessary to begin to understand, and build, Semantic Web ontologies. As our title implies, our goal is to help the “working ontologist” – with our focus on the practice, rather than the theory, of Semantic Web development. We focus on the “how,” rather than the “why,” so as to enable you to better understand how to use these important new technologies.

We appreciate your welcoming us into the Japanese marketplace. We particularly thank the translators who are helping us bring the book into your language and the developers of the use cases added to this Japanese edition of the book so as to better show how these technologies are already having an impact in Japan. We thus hope this translation of our book will further your ability to develop innovative applications both within Japan and in the increasingly global economy.

Dean Allemang and Jim Hendler, 1/1/2010

VN:F [1.9.17_1161]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: Semantic Web, Web Science Tags:

RPI Hackathon: Linking government data

December 9th, 2009

This is an invitation to participate in the RPI Hackathon 2009 for linking government data. For more detailed information check our wiki.

Part of the work done here in the Tetherless World Constellation consists in translating the government datasets available from data.gov into RDF. This effort has produced billions of triples from (at the moment of writing this post) more than 130 datasets. This data can used in multiple ways: It can be queried from a SPARQL endpoint, used in visualizations such as maps or it can be combined with other datasets (whether from data.gov or other sources) to find correlations, clustering or other types of analysis.

However, we think that the data is more interesting and useful when is linked: For example, a system can answer a specific query and also suggest other sources of information that may be relevant to the user. Thus we think that while we keep translating datasets, it also would be nice to link these datasets to the Linked Data cloud and, in order to do that, we are asking your help.

During December 12th and 13th we will host a Hackathon (i.e., an event where people gather together to work on a specific computational problem). This event is part of the Great American Hackathon promoted by Sunlight Labs. We will host this event at Winslow Building, RPI, in Troy NY. It will start from 10AM to 5PM , but if you have only a few spare hours, you are also welcome! As I mentioned above, our main goal is to link the available data to the Linked Data cloud, but if you have also other ideas to develop using one or more of the datasets, please join us too! The only requirement is to bring your computer and register by email to gravea3[@]rpi.edu or difrad[@]rpi.edu. Because we know big brains needs energy, food and beverages will be provided. Even if you can’t attend physically you can help us working online.

Everyone is invited to participate. If you have any comments, questions, etc. please don’t hesitate to contact me at gravea3[@]rpi.edu or check the announcement in data-gov.

Alvaro Graves and the Data-gov team.

VN:F [1.9.17_1161]
Rating: 8.3/10 (3 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)

A Guided Tour into the Data-gov Wiki

December 9th, 2009

We recently revised the data-gov wiki demos and published a guided tour for web users to better understand and use the projects published at the Data-gov Wiki.We also expect the article to be  a tutorial that meet the increasing requests from web developers who want to integrate semantic technology with existing web technology. Here are some highlights:

  • it lists pointers to datasets converted from data.gov and other data sources
  • it lists a couple of simple demos for using critical technologies, such as Google Visualization API, MIT Exhibit, SPARQL (and extended features), SparqlProxy, Triple Store Usage.  All source code and services are included and replicable. You may see the source code at this link.
  • it further lists advanced demos showing how government data are linked by e.g. sharing properties, reusing URI and literal identifiers, and common time and location.

Comments are welcome to be reported at http://code.google.com/p/data-gov-wiki/issues/list.  We are incrementally improving the article, please come back and subscribe our announcement RSS Feeds.

Li Ding and the Data-gov Wiki team

VN:F [1.9.17_1161]
Rating: 6.0/10 (1 vote cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: linked data Tags: ,