Web Accessibility in an Educational Context

October 25th, 2009

I’m currently at the annual meeting for the Human Factors and Ergonomics Society and a few hours ago I was attending a set of presentations by the Internet TG on Input and Output on the Web. One of the talks that caught my attention was by Wayne Shebilske of Wright State University on the use of screen readers to help impaired students complete tasks within educational programs such as WebCT, student status management software, and the like. There seem to be many problems with such tools as Dr. Shebilske pointed out based on his study. I think this could be a prime candidate for SW technologies to step in to improve the end user experience. I foresee using such tools along with technologies like RDFa to give the user a better representation of the content being displayed and improving the overall quality of life for these individuals.

Evan

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)
Author: Evan Categories: Semantic Web Tags: ,

Probing the SPARQL endpoint of data.gov.uk

October 23rd, 2009

We just ran across the preview SPARQL endpoint for UK’s Data.gov (powered by Talis) following Harry Metcalfe’s blog . In order to understand what data is hosted by the triple store, we use a series of SPARQL queries to probe the content in data.gov. We leverage a web service http://data-gov.tw.rpi.edu/ws/sparqlproxy.php to convert SPARQL/XMl result into HTML and JSON.

First, let’s do some warm up exercises

Q: show me some triples!

SPARQL:

SELECT ?s ?p ?o WHERE {?s ?p ?o} LIMIT 5

Result:

<http://www.london-gazette.co.uk/id/issues/58316/notices/240663> <http://www.gazettes-online.co.uk/ontology#hasPublicationDate> “2007-05-02″^^http://www.w3.org/2001/XMLSchema#date .

<http://www.london-gazette.co.uk/id/issues/58316/notices/240663> <http://xmlns.com/foaf/0.1/page> <http://www.london-gazette.co.uk/issues/58316/pages/6359> .

<http://www.london-gazette.co.uk/id/issues/58316/notices/240663> <http://purl.org/dc/terms/modified> “2007-05-02″^^http://www.w3.org/2001/XMLSchema#date .

A: ok, it has some gazette dataset about lond (see http://www.london-gazette.co.uk/), and it uses FOAF and DC vocabulary.

Q: show me some classes and their instances?

SPARQL:

SELECT DISTINCT ?c WHERE { [] a ?c. } LIMIT 5

Result:

<http://www.gazettes-online.co.uk/ontology/transport#RoadTrafficActsNotice>

<http://www.gazettes-online.co.uk/ontology#Notice>

<http://xmlns.com/foaf/0.1/Document>

<http://www.gazettes-online.co.uk/ontology#Issue>

<http://www.gazettes-online.co.uk/ontology#Edition>

A: same observation as above.

Q: Does it host any named graphs?

SPARQL:

SELECT ?g WHERE {GRAPH ?g { ?s ?p ?o } } LIMIT 10

Result: “0″ ^^<http://www.w3.org/2001/XMLSchema#integer>

A: no named graph found, and there is only one big default graph

Now let’s run several expensive aggregation queries (note aggregation queries are not part of the current SPARQL specification)

Q: How many triples?

SPARQL:

SELECT count(*) WHERE {?s ?p ?o}

Result: “5529380″^^<http://www.w3.org/2001/XMLSchema#integer>

A: alright! Aggregation query is support, and there are 5 million triples. Note “count” is a non-standard aggregation function, it may be support differently be different SPARQL endpoints.

Q: How many graphs (and the number of triples in each graph)?

SPARQL:

SELECT ?g count(*) WHERE {GRAPH ?g { ?s ?p ?o } }

Result: “0″ ^^<http://www.w3.org/2001/XMLSchema#integer>

A: no named graph.

Q: How many populated classes?

SPARQL:

SELECT count(distinct ?c) WHERE {[] a ?c}

Result: “99″^^<http://www.w3.org/2001/XMLSchema#integer>

A: there are 99 different classes having direct instances in this triple store.

Q: How many populated properties?

SPARQL:

SELECT count(distinct ?p) WHERE {[] ?p ?o}

Result: “86″ ^^<http://www.w3.org/2001/XMLSchema#integer>

A: There are 86 unique properties being used as predicate in this dataset. Each property has used by 64K (5529380/86) triples in average. There must be some very popular properties, and we will do that survey later.

Q: How many typed individuals?

SPARQL:

SELECT count(distinct ?s) WHERE {?s a ?c}

Result: 504 Gateway Time-out

A: Opps, really expensive. Let’s try something else

Q: How many defined classes?

SPARQL:

SELECT count(distinct ?s) WHERE {{?s a <http://www.w3.org/2002/07/owl#Class> } UNION {?s a <http://www.w3.org/2000/01/rdf-schema#Class>}}

Result: 0

A: no class defined, and the triple store is full of individuals

Q: How many individuals (again)?

SPARQL:

SELECT count(*) WHERE {[] a ?c}

Result: 995694

A: There are nearly 1 millions of typed individuals, so we can easily see every invidual has 5 (5M/1M) triples in average.

Now, let’s do some knowledge discovery

Q: Show me the 3 most/least used classes in this dataset?

SPARQL:

1) select ?c ( count(?s) AS ?count ) where {?s a ?c} group by ?c order by desc(?count) limit 3

2) select ?c ( count(?s) AS ?count ) where {?s a ?c} group by ?c order by ?count limit 3

Result:

1) most used classes

<http://www.gazettes-online.co.uk/ontology#Notice> “156452″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.w3.org/2006/vcard/ns#Address> “106934″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology/person#Person> “87798″ ^^<http://www.w3.org/2001/XMLSchema#integer>

2) least used classes

<http://www.gazettes-online.co.uk/ontology/transport#CycleTracksNotice> “1″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology/transport#PortsNotice> “1″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology/corp-insolvency#AdministrationOrder> “2″ ^^<http://www.w3.org/2001/XMLSchema#integer>

A: Again, the SPAQL query is “safe” because Talis support these SPARQL extensions (c.f. http://n2.talis.com/wiki/SPARQL_Extensions). The class that has the most number of instances in this dataset is http://www.gazettes-online.co.uk/ontology#Notice (which has 156,452 instances.)

Q: what about property usage (top 5)?

SPARQL:

1) select ?p ( count(?s) AS ?count ) where {?s ?p ?o} group by ?p order by DESC(?count) limit 5

2) select ?p ( count(?s) AS ?count ) where {?s ?p ?o} group by ?p order by ?count limit 5

Result:

1) 5 most used properties:

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> “995694″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://purl.org/dc/dcam/memberOf> “312476″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#value> “310170″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology#hasPublicationDate> “181940″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology#hasNoticeCode> “181335″ ^^<http://www.w3.org/2001/XMLSchema#integer>

2) 5 least used properties:

<http://www.gazettes-online.co.uk/ontology/corp-insolvency#dateAdministrationOrderMade> “1″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology#hasAuthorisingOrganisation> “2″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology/personal-legal#isForNextOfKinOf> “2″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology/corp-insolvency#orderAdministrator> “3″ ^^<http://www.w3.org/2001/XMLSchema#integer>

<http://www.gazettes-online.co.uk/ontology#authorisingOrganisation> “49″ ^^<http://www.w3.org/2001/XMLSchema#integer>

A: Some properties are really used (e.g., http://www.gazettes-online.co.uk/ontology/corp-insolvency#dateAdministrationOrderMade has been used only once) while some are heavily used (like rdf:type, which is the most frequently used predicate).

Conclusion

We need to stop probing now. A number of complex queries ended up with a timeout error because (i) “LIMIT” only control the final results, so that we cannot just get statistical results on 1000 triples; (ii) “GROUP BY” may produce too many intermediate results, (iii) the statistics queries does not leverage the index structure of triple store, or index structure are not designed for handling such queries, and (iv) many other issues.

Updates

presented by Li Ding and Zhengning Shangguan

VN:F [1.2.0_562]
Rating: 10.0/10 (1 vote cast)
Author: li Categories: Semantic Web, linked data Tags:

OWL 2 Reference Card released

October 18th, 2009

We’re pleased to announce the OWL 2 Reference Card [1]. The Card is meant to be a “cheat sheet” of OWL 2 features printable on a single piece of paper (on both sides). It is based on the OWL 2 Quick Reference Guide [1], which is now a Proposed Recommendation [2] in the OWL 2 Web Ontology Language document set.

Background: OWL 2 [4] is an extension to OWL 1 with a few new functionalities. Some of the new features are syntactic sugar (e.g., disjoint union of classes) while others offer new expressivity, including:

* keys;
* property chains;
* richer datatypes, data ranges;
* qualified cardinality restrictions;
* asymmetric, reflexive, and disjoint properties; and
* enhanced annotation capabilities

Comments and suggestions to the Card are welcome (please send to public-owl-comments@w3.org)

[1] http://www.w3.org/2007/OWL/refcard

[2] http://www.w3.org/2007/OWL/wiki/Quick_Reference_Guide

[3] http://www.w3.org/TR/2009/PR-owl2-quick-reference-20090922/

[4] http://www.w3.org/TR/owl2-overview/

Jie Bao

VN:F [1.2.0_562]
Rating: 10.0/10 (1 vote cast)
Author: Jie Bao Categories: tetherless world Tags:

My Personal (unofficial) Semantic Web FAQ — a pointer

September 1st, 2009

The joy of multiple blog sites is having to post pointers to one blog entry from another.

My blog at nature.com now has an entry entitled “The Semantic Web: My personal (unofficial) FAQ” which lives at http://network.nature.com/people/jhendler/blog/2009/08/03/the-semantic-web-my-personal-unofficial-faq. Comments, and especially your suggestions for Qs and As are more than welcome there or here (or anywhere else for that matter)

Cheers,

Jim H.

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)

Current Issues in data.gov

July 31st, 2009

While translating data.gov data into RDF, we have discovered some issues with the published datasets. These issues can be roughly categorized as follows:

  • Duplicated Datasets- Some datasets are part of another dataset, e.g. Dataset 140 (2005 Toxics Release Inventory data for the state of California (Environmental Protection Agency)) is a subset of Dataset 191 (2005 Toxics Release Inventory National data file of all US States and Territories (Environmental Protection Agency)).
  • Formatting Issues - The format of some datasets is not friendly to machine processing. Not all datasets offer CSV format data, and parsing table data from them requires non-trivial efforts. Example: Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)). Some websites, meanwhile, have no data at all: Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)), for example, tells you how to order data from the government.
  • screen shot of the text file from dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases) by US Bureau of Reclamation

  • Access Point Issues - The access points for some datasets do not point to pages friendly to machine access. Instead of pointing to a downloadable file covering the entire dataset, some lead to an interactive website where only partial data can be returned by a web-based query. Example: Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics)) and Dataset 96 (National Water Information System (NWIS) (US Geological Survey)).

    screen shot of the query interface for accessing dataset 330 (Local Area Unemployment Statistics) by US Bureau of Labor Statistics

For more details, please visit http://data-gov.tw.rpi.edu/wiki/Current_Issues_in_data.gov .

Sarah Magidson, Li Ding, Dominic DiFranzo, and Jim Hendler

VN:F [1.2.0_562]
Rating: 0.0/10 (0 votes cast)
Author: li Categories: linked data Tags: