Archive

Author Archive

Cuil, Semantic Search

August 13th, 2008

Last week, Cuil.com caught my eye. It gave me very good impression in just 5 seconds (BTW, 10 seconds is a survival maximal for any website to me). First, I tried, as many people may do, my name. It didn’t disappoint me by hitting quite precisely my pages. I also love the grid-based layout. A few minutes later, I found its “Explore by Category” option. It looks like that cuil has some sort of ontology hierarchies for web pages.

A few “google” results reveal that cuil may use some clustering technique to build such hierarchies. It is interesting to think will such hierarchies indeed improve search experience. When I search “Semantic Web”, cuil recommends me to browse “Ontology (computer Science)” and some of its sub category; it also suggests me to look at “James Hendler”’s homepage. I would say that it will be very useful for exploring.

Building meta data using machine learning technology is a cool thing. On the other hand, I believe that human intervention is also critical. When wikipedia knowledge is used in clustering, I expect some gain in recall or preciseness. As “Ontology (computer Science)” is a wikipedia page, I guess that cuil may have already used wikipedia information in their results.

Also don’t forget the “network effect”. I have created a prefix-based, syntactical gmail label hierarchy for a while. I really like to share part of the hierarchy to my friends, so that when I send a mail labeled with “party”, then they don’t need to relabel it again. If millions of users can share their small hierarchies (not only on gmail, but also on flicker, youtube, twine, etc.), each is connected somehow to hierarchies of friends and family, eventually we will have a very large network of ontologies which may improve search much more than we can do now. Just a random thougt.

P.S. I found one interesting thing. Cuil caches my wiki page at Iowa State University. However, that page should be offline no later than May 2008, while Cuil was online officially only on July 28, 2008. It seems its crawler has been alive for a while.

Jie Bao

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)
Author: Jie Bao Categories: Uncategorized Tags:

Captcha, Turing Test, and Semantic Web

August 6th, 2008

On the web nobody knows you are a dog, …… or a human. That’s why there are programs on the web to identify one as a human (from bots or dog or cats or……). Most popular ones are captcha. It is based on a simple assumption: no OCR agent so far can be as smart as a human is. To me, it looks like a super-simplified Turing test: an AI program has “real” intelligence as a human has, if being asked by the same question, another human can’t tell who is AI and who is human.

I can’t help imagining that one day, when OCR agents get smart enough to pass the captcha test (I strongly believe that day is not far away), what test we will use to identify a human on web. Math? That will be easy for a good program. Scrabble? maybe, but not that secure. Ask for a Shakespeare’s sonne? Or the end year of world war II? That looks more likely to succeed. But…There are two issues.

First, an agent may have access to a knowledge base. With projects like Dbpedia, human knowledge has been KBized in a speed never seen before in the history. A query as ” the end year of world war II” may be answered by a semantic web agent fairly quickly. I can imagine that someday we will have to design increasingly hard questions (like art things) to identify a human and fight spamming.

The other issue is that a human may have NO access to a knowledge base. Many, many people in the world does not know “the end year of world war II”, even if they may be knowledgeable in other things. They may not even know where to find such a knowledge. Also, they can get bored when been consistently asked such captcha questions and quit — technically, that means they failed the test thus are not “human”. When captcha becomes increasingly hard (like art things), more and more people may fail in one reason or another (including boredness). That will also lead to the failure of the identification system.

Will semantic web help spamming by designing smart agents? :) Maybe, let’s wait and see.

Jie Bao

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)
Author: Jie Bao Categories: Uncategorized Tags:

Towards Webtop

July 25th, 2008

by Jie Bao

Some of our Tetherless World researchers including me have just written a short paper to sell the idea of constructing a “webtop” using semantic technologies. In short, a webtop is a desktop on the web, that does similar jobs such as managing files, doing word processing, managing contacts, scheduling tasks, emailing, etc. Please see some examples of webtops with pretty GUIs.

Almost one decade ago, there has been hot for a while for the concept of “network computer”. At that time, a network computer means some low-end computer with limited storage and computational capacity that relying on the network to get great power. The webtop idea reminds me of network computer as they, while are different in many aspects, share the same idea of powering users with networked infrastructure. Ten years ago, this vision was tested with physical computers but largely failed, while today, with the advance of technologies, is revived by allowing users to create virtual computers that only exist on the websphere. I have many reasons to believe this time it will not only survive, but also prevail.

One reason is from my personal experience. From about two years ago, I stopped installing many software that have been with me for many years: Encarta is replaced by Wikipedia.com, Outlook is replaced by Gmail, MS Street is replaced by Google Maps, MS Word is replaced by writing in wiki, Powerpoint is replaced by online latex writing with the Beamer package, among a long list of other things. Browser is the application I stayed for more than 80% of time when I’m on my computers. There is indeed a strong need for me to organize all such online applications and data — simply bookmarking is barely a solution. I need something that can organize them, enable me quick access to them, and last but not least, pretty and neat. A webtop does exactly those things.

How semantic technologies help in providing a webtop? Actually, long before the term “ontology” getting popular, users are already creating ontologies on daily bases: email classification, creating file folder trees, grouping contacts or naming a photo as “Wedding picture at Troy”, all those efforts are creating relations between things or annotating a “meaning” to an entity. With semantic technologies, those relations and annotations can be made explicit so that data can be more easily managed and queried. For example, I may query that “find all 2005 photos of my friends”, or “show all meetings (even if they are not called meeting, such as “briefing”) in the past month”. A webtop based on semantic technologies will make such an ability universal to any application on its top.

There have been controversies about semantic web ever since the term is coined. I think this is partly because the semantic web community as a whole, failed to provide enough end-user friendly tools that can do something helpful in daily life. I wish to see more tools to help daily web activities: semantic email, semantic blog, semantic calender, semantic abstract of news (a little more than RSS), tagging files (picture, mp3,…) with taxonomy, etc. Even more important, to survive, such an application should never ask users to learn RDF or anything needs more than 3 minutes to understand. Bring such applications together, it’s a webtop. I believe something like this is one of the killer apps the community has long been waiting for.

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)
Author: Jie Bao Categories: Uncategorized Tags:

OWL or OLD?

July 22nd, 2008

I just noticed the “OWL 2 Web Ontology Language: Requirements” document from the OWL Working Group. Interestingly, while the “W” in OWL stands for “Web”, I didn’t see any use case from web applications in the usual sense. As the leading requirements are from the need for domain knowledge bases, I would suggest the name of the new language, instead of OWL 2, to be Ontology Language of Domains (OLD) — Just kidding. OWL claims to be needed by common web users, but such users are surprisingly under-represented in the specification process. We have already seen many specially designed, highly expressive, but, narrowly applied languages in the old KR schools. Do we need to invent yet another one here, again?

Jie

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)
Author: Jie Bao Categories: Semantic Web, owl Tags:

Grandma Gone Surfing

June 27th, 2008

Debbie Heisler has just sent me a link “Internet overhaul wins approval. One of the proposals mentioned catching my eye is that domain names written in Asian, Arabic or other scripts will be supported.

Although it may not be a new idea (for example, 3721.com, now part of Yahoo!, has provided a service of supporting urls in Chinese for years), having local names other than Roman characters is absolutely a good move. About 10 years ago, I was asked to teach one of my father’s colleague on how to use computers; it was a hard job because she didn’t know how to use keyboard, which in turn because she didn’t know what are characters “A”, “B”, “C”. My mom is better: she is now a daily web surfer and she knows Roman characters - but she can never remember English words like “Google”, not to mention google.com. What she does now is to set a hub page as her browser’s homepage, with a Google link on it (and of course, in Chinese). She uses baidu.com, a Chinese counterpart of Google, more frequently than Google, partly because the word “Bai Du”, which literally means “a hundred times”, is much easier for her to remember (on the other hand, Google’s local name “Guge” is almost meaningless).

We people in academia are so used to our (both language and technical) education and sometimes take many things for granted. Two weeks earlier at the Tetherless World Grand Opening, Wendy Hall, the ACM President-elect, had mentioned that in her recent visit to China for the WWW 2008 conference, she was surprised to learn that there is such a huge part of web that is only in Chinese. “Chinese may be the most popular language on the web in the future”, she said. This may or may not become true, but I agree that web technologies should be easier to use and consider internationalization even more.

However, “ease” means differently for different people. When my mom learned to use mouse, she had to use her both hands to control it :) — and she did not give up only because she wanted to use computers to communicate with me. Last weekend, I tried to teach my father-in-law to use computers, he also had a hard time to control the mouse: regular computer users have an _instinct_ to locally relocate the mouse so we never feel “the line is too short”, but he has no such an instinct.

I’m a little off the topic. But what I want to say is that computers should be designed not only for the youth, but also for seniors; not only for English-speaking people, but also for the other 3/4 of people in the world; not only for geeks, but also for grandmas.

As to the Semantic Web, we should also always keep our “users” in mind. Who gonna use semantic web? What things are on the top list we should support? I have been long thinking about this question: as most of our daily web activities are emailing, blogging, calendaring, searching, etc., why there is still no end user oriented semantic tools to help us for such activities? For example, I have tried many “semantic search engines”, e.g., Swoogle, SWSE and Sindice, none of them can be considered end-user oriented: I cannot explain most of their results in RDF to my mom, just for an example. Google is a killer app, as my mom can use it even if she cannot spell “Google” itself. We will need something like that.

Jie Bao

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)
Author: Jie Bao Categories: Uncategorized Tags: