Archive

Archive for the ‘Web Science’ Category

Notes for _Freebase: An Open, Writable Database of the World’s Information_ (ISWC 2008 Keynote)

October 29th, 2008

The ISWC 2008 keynote was presented by John Giannandrea (Metaweb Technologies Inc)

Semantic Web is based on a graph database which is not natively supported by relational database or column store. (More accurately, graph database is brought back by semantic web community while it was quite prospective in database community ten years ago.)

Ontology creation is a social process, and both freebase and semantic wiki are tools that enable users to create ontological vocabulary without worrying too much on building a comprehensive ontology. With such open-ended ontology, and effective query language is very important. Interesting enough, the query language of Freebase and Semantic Wiki shares similar flavor – they envision the semantic web as a instance store: where-clause simply describes a filter for instances, select-clause focus on retrieving the properties of the result instances.

Here are some facts about freebase:

* Scale of freebase: 156,000,000 assertions made; 1370 published types; 75 domains. (well, it is easy to see that most published types are well populated)

* view about the Semantic Web

Yes: graph model, identity, web based.

No: no description log; schema not ontology; a writable database!

* Freebase is not formal system cyc, OWL, sumo, true knowledge, and halo; nor google base.

* An industrial view on the relation between audience and complexity (inverse)

Google > Wikipedia > Del.icio.us > NY Times > dbpedia > cyc, OWL2

(Well, industrial people only care and learn what is needed to achieve their goals. They care more on functions, adoption and profits, and they are less picky on soundness and completeness.)

Freebase is dealing with an “identifier” web. While one thing may have quite some name, the names collaboratively contribute the semantics too. (yes, identity is a key problem for web application)

Greetings from ISWC 2008 by Li Ding

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: Semantic Web, Web Science Tags:

What leads to interoperability? Lessons learned from Dublin Core and DOI

July 15th, 2008

Interoperability is a desired feature when people access Web content, and there is a long way towards this dream. In general, interoperability on the Web can be abstracted as many users communicating with one another to share information. Two extremes are obvious, (i) achieving a language for all at the cost of minimal information can be exchanged, and (ii) achieving a language for each pair so that such pair can maximally exchange information. These two extremes may converge when the users are homogeneous, i.e. from the same community and hosting similar information.While the simplicity and flexibility of Dublin Core (DC) have attracted many followers, they also lead to limited interoperability among DC applications. The comments in [2] made an interesting analogy: “Dublin Core applications are like snowflakes – no two are exactly the same”. For example, dc:date neither restricts the range of the value (that leaves no place of quality validation) nor offers clear enough semantics of that property (it works more like a legal document that needs lawyers’ interpretation). More researchers [1,3] criticized DC that such limited interoperability may restrict automated metadata processing and thus made DC useless.

Digital Object Identifier (DOI), on the other hand, has fast growing instance data space in the publishing industry. Unlike DC, DOI requires more agreements including (i) more mandatory properties, (ii) more restrictions on the value of properties; and (iii) a federated metadata registration mechanism. These features ensure better structured and interoperable DOI instance data.

From the above study, we may raise the following hypotheses:
1. simplicity and flexibility can lower adoption cost, but they should be carefully enforced to avoid damaging interoperability
2. restrictions (e.g. the range of property value) can ensure data quality and thus promote interoperability
3. making more information interoperated among systems is preferred to making all systems interoperating
4. interoperable metadata should support non-trivial automated data integration, such as and reference resolution.

Further readings
[1] Beall, J. (2004), “Dublin Core: an obituary“, Library HiTech News, Vol.21, No. 8, pp 40-1,
[2] Jill Hurst-Wahl (2007), “Dublin Core?”, (the comment is more interesting than the blog) access on July 15, 2008
[3] Allan Cho (2008), “Dublin Core is Dead, Long Live MODS“, access on July 15, 2008

Li Ding

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: Web Science Tags:

Fellowship of the (Semantic) Web: The Two Towers

May 25th, 2008

By popular request (okay, a couple of people asked for it), I have put my Talk from Semantic Technologies 2008 online – warning, it’s about 22M pdf (lots of gratuitous images to keep things fun)

Enjoy.

Jim H.

VN:F [1.9.13_1145]
Rating: 7.0/10 (1 vote cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: AI, Semantic Web, tetherless world, Web Science Tags:

Research challenges from TWINE

May 21st, 2008

An interesting interview(source), by John Breslin, revealed some interesting technology features behind Twine: privacy, data integration, and data storage. I got a mixed feeling on that none existing triple/quad stores are used and TWINE had developed its own. How do the current semantic web technologies fit in enterprise-level, small-group-level, and person-level applications, and which triple store solution is ready for supporting such applications? The eight-element tuple is designed for efficiency, but will that be a common model for other social semantic web sites? As for privacy, are there any new benefits or new challenges brought by the semantic web technologies, or we are still using (user, group) access control mechanisms widely used in Web 2.0. Finally, the data integration would be a very interesting challenge: do we have reasonably good automatic entity disambiguation tools; how to use “collective intelligence” to complement the automated tools; and how to present the integration results to end users without causing too much surprise. In general, the deployment of TWINE is promising; and that will produce more interesting and practical challenges to the research community.

Initially Radar had their own triple store, an LGPL one from the CALO project. They found that it didn’t scale towards web-scale applications, and it didn’t have the levels of transaction control you’d need from an enterprise application. They decided to go for a SQL database (PostgreSQL) with WebDAV. However, relational databases weren’t optimised for the “shape” of data that they were putting into it, so it needed to be tweaked. They’ve had no performance issues so far, but they may move to a federated model next year.

….Twine uses an eight-element tuple store (subject-predicate-object, provenance, time stamp, confidence value, and other statistics about the triple or item itself). They can do predicate inferencing across statements, access control, etc. …

… The key “secret sauce” is that everything in Twine is generated from an ontology. The entire site – user interface elements, sidebar, navbar, buttons, etc. – come from an application ontology…

Q: The first one was about privacy. What if you add something and then later you decide that you want to delete it – is it really deleted or does Twine keep it around?

A: Nova answered that currently, it is not really deleted, it goes into a non-visible triple. But they will be doing that (really deleting it) soon.

Q: As one imports information from various places, what exactly is there in Twine that will prevent a person having to merge any duplicate objects?

A: Nova said there is limited duplication detection at the moment, but this will be improved in a few months. Most people submit similar bookmarks and it is reasonably straightforward to identify these, e.g. when the same item is arrived at through different paths on a website and has different URLs.

Q: Why does Twine use tuple storage: why is it not using a quad?

A: Nova said it’s faster in their system, so for performance reasons they decided to avoid reification.

Li

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: Semantic Web, Web Science Tags:

Towards RDFS 3.0 (or OWL 2 R Full)

April 16th, 2008

Summary — there is a new “profile” of OWL Full that might be of great interest to the RDF/Data Web community — read on:

To those who follow W3C happenings, you know that I’ve had some problems with, and resigned from, the new OWL Working Group. The problems have mainly been related to the philosophy of what this is all about, more than the details of specific language features, and maybe I’ll blog about that some other time. However, in this entry I want to say something positive about one small piece of what the working group has done, and direct the RDF community to take a look at it– I believe it may be close to something we’ve needed for a long time.

In the “OWL 2 Web Ontology Language: Profiles” document (http://www.w3.org/TR/2008/WD-owl2-profiles-20080411/) the group has created a new set of OWL profiles (formerly called fragments) so instead of OWL Lite, DL, and Full, we now have (probably to be renamed at a later date) OWL 2 Full and a number of profiles OWL 2 DL, OWL 2 EL++, OWL 2 DL-Lite, OWL 2 R DL, and OWL 2 R Full (there are also be the unnamed RDF equivalents of the EL++ and OWL DL-Lite, but the group refuses to acknolwedge that, a primary reason for my leaving — but that’s another story again).

Anyway, it is to the last of these “OWL 2 R Full” that I would like to direct the attention of the RDF community — it is a bit hard to tell from the relatively cryptic document, but this fragment is an extension to RDFS that adds a small amount of useful OWL vocabulary, without requiring commitment to some of the strong restrictions needed for the various DL dialects. The specification includes an axiomatic specification of the language (i.e. rules) and starting to circulate, but not in the OWL group’s document, is an N3 version of the language making it very easy to see the relation to RDF. A couple of the larger members of the Working Group have stated that they will support this language (I’m not sure whether in public or not, so I’ll let them speak for themselves) which bodes well.

For those people looking at the “Data Web” or at “Web 3.0″ applications, I think this profile of OWL may be worth looking at — it would definitely be improved by some comments from serious Web 3.0 application developers – as it may well be a good target of opportunity for further RDF development. In the famous Semantic Web layercake, this profile (which I would like to see renamed RDFS 3.0) would be able to sit under the Rules and Ontology fragments, where RDFS is now, without derailing RDF(S) into the peculiarities of description logics, yet allowing some useful constructs to be added. For example, FOAF, DOAP and other of the most used RDF-based ontologies would be within (or close to) this new profile

So if you’re not interested, or are studiously ignoring, the OWL drafts, let me suggest you take a look at Table 2 of section 4 of the Profiles document (and section 4.2.3 if you want to see the rules). I also suggest that one does not have to understand anything else in that section (much of which seems to me to be written for those with PhDs in AI or similar background) to be able to see there’s something useful in here.

So take a look at OWL 2 R Full – the name is awful, but the language might be a really powerful new tool on the RDF Web.

-Jim Hendler

p.s. Let me also suggest taking a look at the public email by Michael Schneider at http://lists.w3.org/Archives/Public/public-owl-wg/2008Apr/0171.html – one of the few RDF proponents in the working group, he gives a great example of using OWL R Full in an RDFS context…

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: owl, Semantic Web, Web Science Tags: