Archive

Author Archive

Will no CO2 emission come together with no global warming?

March 30th, 2011

OK, you have been fooled by the title. This post will not talk about environment policies, as I have no courage or knowledge to fight either school about global warming.

As a part of my recent work on “semantic information theory”, I’m reading Compression Without a Common Prior: An Information-theoretic Justification for Ambiguity in Language, by Brendan Juba of Harvard. I had some nice conversations with Brendan on Universal Semantic Communication when he was at MIT . It’s nice to read another paper from him.

In his paper, Brendan uses an example

For an English example, consider the example of sentence, You may step forward when your number is called. The implication is that you may not step forward before your number is called, for if that was not the intention, the sentence You may step forward at any time could have been used

Logically, that means if we know p → q, is ¬p →¬q true?

We know this is not a correct inference (i.e., the Denying the Antecedent fallacy). But why it is so often people fall for fallacies of this kind?

I tried to come up with a reasonable explanation using the semantic information theory (SIT). First introduced by Carnap and Bar-Hillel, SIT studies meanings carried by messages. If a sentence is less likely to be true, then it is more surprising. So “Today is hot, and tomorrow is also hot” means more than “Today is hot”. On the other hand, if we say “Today is hot, or today is not hot”, we give very little information.

In classical information theory, the entropy of a message is determined by the statistical probability of the symbols appearing it. In SIT, the entropy of a statement is determined by its logical probability, i.e., the likelihood of observing a possible world (model) in which this statement is true. To see the difference,  let’s see another example: the message “Rex is not a tyrannosaurus” (M1) is less “surprising” than “Rex is not a dog” (M2), not because the word “tyrannosaurus” is more common than “dog”, but because the individuals represented by “tyrannosaurus” (now considered extinct) are less common than the individuals represented by “dog”. Thus, M1 has less semantic information than M2, even if it may have more Shannon information based on the statistical distribution of English words.

Now back to ( p → q)→(¬p → ¬>q). We have the truth table:

p q p → q ¬P→¬q

T T T T

T F F T

F T T F

F F T T

As we are ignorant about the likelihood of p and q, let’s suppose all 4 situations in the truth table are equally likely.  So the logical probability of ¬P→¬q is

m(¬P→¬q)=3/4

Now we know that p → q is true, so the second row in the table is ruled out. Then, the conditional logical probability

m(¬P→¬q|p → q)=2/3 [less surprising, less information]

Thus, by hearing that “You may step forward when your number is called“, it’s rational to revise downwards one’s belief about that “You may not step forward before your number is called“. The first sentence, while not a logically sufficient condition for the second, carries some semantic mutual information about the other.

Wait, is it the reverse of what we want to justify?

Maybe the real implication of ”You may step forward when your number is called” is “No number called, no stepping forward”, i.e., instead of causation (¬P→¬q), we mean correlation (¬P^¬q). If that is true, it will be reasonable to not moving before your number is called:

m(¬P^¬q)=1/4

m(¬P^¬q|p → q)=1/3 [belief increases!]

Now return to the title, assuming P is “CO2 emission” and Q is “global warming”, and also assuming that the causation p → q stands, will ¬P^¬q, i.e., no CO2 emission will happen together with no global warming, make more sense? Well, based on the analysis above, it is. Logicians may disagree, but polar bears will certainly appreciate the argument.

Reference

[1] CARNAP, R., AND BAR-HILLEL, Y. An outline of a theory of semantic information. RLE Technical Reports 247, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge MA, Oct 1952.

[2] B. Juba, A. Kalai, S. Khanna, and M. Sudan. Compression Without a Common Prior: An Information-theoretic Justification for Ambiguity in Language. In 2nd Symposium on Innovations in Computer Science. Beijing, P.R. China. 2011.

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: tetherless world Tags:

15 (and counting) Ways to Explore ISWC 2010 Data

November 8th, 2010

This year at ISWC, when we worked on the metadata, we have a Data Consuming task force to develop tools that can browse/visualize the data many different ways, e.g., faceted browser, filter browser and mobile browser.

As soon as we have the basic dataset published, we immediately get feedback from people on off-the-shelf tools that can work with the data. The list is quickly growing. I collected the screen shots of some working instances (including tools the metadata committee has built) in a slides. I have no doubt that the number “15″ will be changed when the main conference begins …. in 2.5 hours! So expect some updates very quick.

What strikes me is that the number and diversity of data browsers currently available, and many of them are clearly reaching the level of maturity for non-expert users to explore. That was not the case even one year ago. So much has been changed for the Semantic Web in 2010!

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: iswc Tags:

Quick Update on ISWC Twitter Data (1)

November 8th, 2010

At ISWC 2010, there are several on-going efforts to leverage Twitter data. Some ones that I’m aware of are:

Joshua Shinavier has helped to build a triple store (powered by AllegroGraph) that contains tweets related to the conference, along with basic ISWC metadata. Here is an example of SPARQLing with the triple store (details about tweets with tag #iswc2010 and #iswc). More examples and guide on how to use the triple store will be out soon.

URL: http://flux.franz.com/catalogs/demos/repositories/iswc2010#query

Marian Dörk helped us to visualize tweets at ISWC. You can see the relative traffic by time, the distribution of buzz words at the conference, and who is twittering about what. Marian is looking into interviewing our attendees for the tool – if you have comment, let him (mdoerk@ucalgary.ca) or me know (baojie@cs.rpi.edu)

URL: http://ilab51.cpsc.ucalgary.ca/iswc2010

To be continued.

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: iswc, tetherless world, twitter, visualization Tags:

ISWC2010 Metadata is Online

November 3rd, 2010

Below is an announcement I just sent for the ISWC2010 Metadata.

========

Dear SWers and LODers

ISWC2010 is around the corner and we are very excited about the coming week!

As in previous years, ISWC 2010 provides its basic metadata in RDF. The dataset gives details about authors, organizers, papers, events (e.g., sessions and talks), and some mappings to other linked data. The data is freely available at http://data.semanticweb.org/conference/iswc/2010, and can be downloaded as a single RDF file. There is a SPARQL endpoint [1] for this dataset, as well as for some previous ISWC/ESWC/WWW conferences. For more details about access, please refer [2].

You may view/use the data in many different ways. Any RDF-aware application should be able to access it, e.g., browsing [3][4]. If you use an IPhone/IPad/IPod/Android or Chrome/Safari, you can also look at a mobile browser at iswc.mobi [5] (provided by Alvaro Graves, RPI). Please also note that this year almost all pages on the ISWC 2010 website have some RDFa annotations that you can distill with, e.g., by [6]. We are also working on other user interfaces and additional data, e.g.,about workshops.

An initial list of tools, apps and visualizations for the ISWC 2010 metadata is on the W3C SW Wiki:

http://www.w3.org/2001/sw/wiki/ISWC_2010_Data_and_Demos

Free feel to expand the list if you know other tools that can work with the dataset, or have developed mashup, visualization or any other apps based on the dataset.

Please let me know if you notice missing information or errors in the dataset, or have any suggestion to improve the dataset.

The dataset is made possible by the work of the ISWC 2010 Metadata Committee and help from many members of the SW and LOD community. I would like to thank all of you who supported this work in one way or another.

I wish you will have fun playing with the data, as well as participating the conference, either onsite or remotely!

Cheers
Jie

[1] http://data.semanticweb.org/sparql
[2] http://data.semanticweb.org/documentation/user/faq#get_data
[3] http://linkeddata.uriburner.com/about/html/http://data.semanticweb.org/conference/iswc/2010/paper/498
[4] http://iwb.fluidops.com/resource/semantic:person/ian-horrocks
[4] http://iswc.mobi
[5] http://www.w3.org/2007/08/pyRdfa/extract?uri=http://iswc2010.semanticweb.org/accepted-papers/123

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: iswc, tetherless world Tags:

ISWC 2010 Attendees by Country

November 3rd, 2010

This is a preliminary stat of ISWC attendees by country. Data is from registration list as of Oct 28, 2010, and the final data may be a little different. Also note that it’s different from the author/organizer data.

By no surprise, the host country China has the most attendees (15.4%). US, UK and Germany follow as the next biggest players (14.9%, 14.4%, 12.6%), respectively. By continents, we have

Europe: 60.05%
Asia: 21.69%
Americas: 17.72%
Australia: 1.06%
Africa: 0.53%
Antarctica: 0% – not surprising, isn’t it ?

Clearly, Europe is still the most active in Semantic Web research. We also see this fact by various other statistics, e.g., orgnaizations involved in recent semantic web events.

======== geek separation line ========

The following tools are used:

  • The original data is in spreadsheet. Countries are given in ISO 3166-1 alpha-2 code, e.g., Netherlands is NL, which many people are not familiar with. I found the code-to-name mapping from another source. However, as Excel can’t do join, I imported the two csvs into one RDF file using TopBraid Composer (here is how).
  • A SPARQL query is used to do the join, the result is saved as a Google Docs spreadsheet.
  • The spreadsheet is visualized by Datapress.

From the end user point of view, while all the tools I used are wonderful, what I really wish to have is some integrated environment that I can do all the above together, ideally all in browser – and even more ideally – I don’t have to know that there are RDF and SPARQL underneath, just like now I don’t have to worry about Javascript, JSON or Google Charts since they are all hidden from the interface.

It’s really likely that I missed some tools that can do the job easier, as semantic tools have been mushroomed in the recent years. I will keep looking for better ways to visualize ISWC data – in the end, I wish everyone, especially non-prorammers, can do it. That will show the most of the beauty of semantics. I believe we are very close to that.

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
Author: Categories: iswc, tetherless world, visualization Tags: