Archive

Author Archive

Notes for _Freebase: An Open, Writable Database of the World’s Information_ (ISWC 2008 Keynote)

October 29th, 2008

The ISWC 2008 keynote was presented by John Giannandrea (Metaweb Technologies Inc)

Semantic Web is based on a graph database which is not natively supported by relational database or column store. (More accurately, graph database is brought back by semantic web community while it was quite prospective in database community ten years ago.)

Ontology creation is a social process, and both freebase and semantic wiki are tools that enable users to create ontological vocabulary without worrying too much on building a comprehensive ontology. With such open-ended ontology, and effective query language is very important. Interesting enough, the query language of Freebase and Semantic Wiki shares similar flavor – they envision the semantic web as a instance store: where-clause simply describes a filter for instances, select-clause focus on retrieving the properties of the result instances.

Here are some facts about freebase:

* Scale of freebase: 156,000,000 assertions made; 1370 published types; 75 domains. (well, it is easy to see that most published types are well populated)

* view about the Semantic Web

Yes: graph model, identity, web based.

No: no description log; schema not ontology; a writable database!

* Freebase is not formal system cyc, OWL, sumo, true knowledge, and halo; nor google base.

* An industrial view on the relation between audience and complexity (inverse)

Google > Wikipedia > Del.icio.us > NY Times > dbpedia > cyc, OWL2

(Well, industrial people only care and learn what is needed to achieve their goals. They care more on functions, adoption and profits, and they are less picky on soundness and completeness.)

Freebase is dealing with an “identifier” web. While one thing may have quite some name, the names collaboratively contribute the semantics too. (yes, identity is a key problem for web application)

Greetings from ISWC 2008 by Li Ding

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: Semantic Web, Web Science Tags:

News from _An OWL2 Far_ (ISWC 2008 Panel Discussion)

October 28th, 2008

Stefan Decker raised a case “missing children” indicating that Description Logic, or more precisely open world assumption, maybe overkill because it is not needed in most real world applications even though AI researchers like it.

Michel Dumontier, as a researcher trying to adopt OWL, reviews the well-known semantic web benefits, again, from the SW developer community who hopes these nice features will be helpful to real world Web developers and users.

Tim Finin, “maybe we’re a victim of our own success”. Moving towards KR monoculture could be quite dangerous. He raises some examples that OWL does not fit, e.g. when encoding knowledge extracted from knowledge, a lot of information lots such as time, uncertainty, and provenance.

Ian Horrocks claims that OWL has good connection with other KR approaches, OWL is not going to solve all problems, but it is useful in general.

One biggest argument raised by many is that “is OWL useful?” not even “is OWL2 useful”. Of course there are both supportive and negative evidences, and neither side can convince the other side. Someone also argue that the learning curve of OWL will just stop potential user. (Industrial adoption is a better benchmark because researchers are more flexible).

Another issue is “scalability”. Jim Hendler tried to be even worse than Stefan, Twine is claimed to Semantic Web applications, it use a few pieces of OWL to scale up. In general, scalability is the non-negotiable requirement of Web data computation. That is database community avoids, for instance, recursion in relation algebra.

A third question, raised by David Karger, “what are we doing with OWL? Which pieces of OWL are actually being used, and Why?” (This is actually a motivation for OWL2, and why three OWL2 fragments are proposed. We are looking forward to see if OWL working group can give industrial users a good answer.)

Well, a fourth question is “OWL2 is KR, i.e. a family of Description Logics profiles that link to other KR languages?” and/or “OWL2 is trying to promote better web or semantic web applications?”.

Closing remarks (I did my best to keep it original)
* Ian, “choose hope, not fear”
* Tim, “I can see Russia from my house”
* Michel, “OWL is pretty good language”
* Stefan, “if you did not fix the little thing, you may miss the boat”

By Li Ding
Greetings from ISWC 2008

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: owl, Semantic Web Tags:

Human and the Semantic Web

July 16th, 2008

“The Semantic Web is mainly serving machine agents” has been dominating my mind for many years. Now human users may also want to explore the big mass of RDF data not just for debugging purpose. Semantic Web user interaction is becoming an important part of Semantic Web layer cake and research direction (see SWUI workshops) in ISWC.

As a “web of data”, the Semantic Web, boosted by Linked Data efforts, presents web users a maze of RDF graph with billions of arcs (triples). To explore the maze, below are some html browser approaches I came across:

An alternative approach is graphical browser, which seem to be more intuitive to end users. An interesting blog Large-scale RDF Graph Visualization Tools covered a handful of useful resources including something I never encountered and even links to 28 visualization software packages. Of course the list missed some RDF viz browsers such as FOAFnaut, Welkin, and self visualization. It is notable that scalability is still bugging most of the visualization approaches due to the limit of memory size: my last experience was “Otter had a hard time when processing a graph with over 10,000 nodes”.

There are still many user interaction issues beyond the browsers (e.g. search engines, semantic wiki), and a well-designed UI component is probably the key to the Killer-App of the Semantic Web.

Li Ding

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: Semantic Web Tags:

What leads to interoperability? Lessons learned from Dublin Core and DOI

July 15th, 2008

Interoperability is a desired feature when people access Web content, and there is a long way towards this dream. In general, interoperability on the Web can be abstracted as many users communicating with one another to share information. Two extremes are obvious, (i) achieving a language for all at the cost of minimal information can be exchanged, and (ii) achieving a language for each pair so that such pair can maximally exchange information. These two extremes may converge when the users are homogeneous, i.e. from the same community and hosting similar information.While the simplicity and flexibility of Dublin Core (DC) have attracted many followers, they also lead to limited interoperability among DC applications. The comments in [2] made an interesting analogy: “Dublin Core applications are like snowflakes – no two are exactly the same”. For example, dc:date neither restricts the range of the value (that leaves no place of quality validation) nor offers clear enough semantics of that property (it works more like a legal document that needs lawyers’ interpretation). More researchers [1,3] criticized DC that such limited interoperability may restrict automated metadata processing and thus made DC useless.

Digital Object Identifier (DOI), on the other hand, has fast growing instance data space in the publishing industry. Unlike DC, DOI requires more agreements including (i) more mandatory properties, (ii) more restrictions on the value of properties; and (iii) a federated metadata registration mechanism. These features ensure better structured and interoperable DOI instance data.

From the above study, we may raise the following hypotheses:
1. simplicity and flexibility can lower adoption cost, but they should be carefully enforced to avoid damaging interoperability
2. restrictions (e.g. the range of property value) can ensure data quality and thus promote interoperability
3. making more information interoperated among systems is preferred to making all systems interoperating
4. interoperable metadata should support non-trivial automated data integration, such as and reference resolution.

Further readings
[1] Beall, J. (2004), “Dublin Core: an obituary“, Library HiTech News, Vol.21, No. 8, pp 40-1,
[2] Jill Hurst-Wahl (2007), “Dublin Core?”, (the comment is more interesting than the blog) access on July 15, 2008
[3] Allan Cho (2008), “Dublin Core is Dead, Long Live MODS“, access on July 15, 2008

Li Ding

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: Web Science Tags:

Research challenges from TWINE

May 21st, 2008

An interesting interview(source), by John Breslin, revealed some interesting technology features behind Twine: privacy, data integration, and data storage. I got a mixed feeling on that none existing triple/quad stores are used and TWINE had developed its own. How do the current semantic web technologies fit in enterprise-level, small-group-level, and person-level applications, and which triple store solution is ready for supporting such applications? The eight-element tuple is designed for efficiency, but will that be a common model for other social semantic web sites? As for privacy, are there any new benefits or new challenges brought by the semantic web technologies, or we are still using (user, group) access control mechanisms widely used in Web 2.0. Finally, the data integration would be a very interesting challenge: do we have reasonably good automatic entity disambiguation tools; how to use “collective intelligence” to complement the automated tools; and how to present the integration results to end users without causing too much surprise. In general, the deployment of TWINE is promising; and that will produce more interesting and practical challenges to the research community.

Initially Radar had their own triple store, an LGPL one from the CALO project. They found that it didn’t scale towards web-scale applications, and it didn’t have the levels of transaction control you’d need from an enterprise application. They decided to go for a SQL database (PostgreSQL) with WebDAV. However, relational databases weren’t optimised for the “shape” of data that they were putting into it, so it needed to be tweaked. They’ve had no performance issues so far, but they may move to a federated model next year.

….Twine uses an eight-element tuple store (subject-predicate-object, provenance, time stamp, confidence value, and other statistics about the triple or item itself). They can do predicate inferencing across statements, access control, etc. …

… The key “secret sauce” is that everything in Twine is generated from an ontology. The entire site – user interface elements, sidebar, navbar, buttons, etc. – come from an application ontology…

Q: The first one was about privacy. What if you add something and then later you decide that you want to delete it – is it really deleted or does Twine keep it around?

A: Nova answered that currently, it is not really deleted, it goes into a non-visible triple. But they will be doing that (really deleting it) soon.

Q: As one imports information from various places, what exactly is there in Twine that will prevent a person having to merge any duplicate objects?

A: Nova said there is limited duplication detection at the moment, but this will be improved in a few months. Most people submit similar bookmarks and it is reasonably straightforward to identify these, e.g. when the same item is arrived at through different paths on a website and has different URLs.

Q: Why does Twine use tuple storage: why is it not using a quad?

A: Nova said it’s faster in their system, so for performance reasons they decided to avoid reification.

Li

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)
Author: Categories: Semantic Web, Web Science Tags: