<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Tetherless World Weblog</title>
	<atom:link href="http://tw.rpi.edu/weblog/feed/" rel="self" type="application/rss+xml" />
	<link>http://tw.rpi.edu/weblog</link>
	<description>Everything about Web Science</description>
	<lastBuildDate>Wed, 01 Feb 2012 17:01:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Get comments on the Water Quality Portal from AGU 2011</title>
		<link>http://tw.rpi.edu/weblog/2012/02/01/get-comments-on-the-water-quality-portal-from-agu-2011/</link>
		<comments>http://tw.rpi.edu/weblog/2012/02/01/get-comments-on-the-water-quality-portal-from-agu-2011/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 17:01:10 +0000</pubDate>
		<dc:creator>Ping Wang</dc:creator>
				<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1758</guid>
		<description><![CDATA[AGU 2011 was the 1st conference I attended after I came to RPI. There were many interesting activities in the conference and I feel this is a very rewarding experience. I had the poster for the semantic water quality portal and also helped a little bit with the RPI table in the academic section (by [...]]]></description>
			<content:encoded><![CDATA[<p>AGU 2011 was the 1st conference I attended after I came to RPI. There were many interesting activities in the conference and I feel this is a very rewarding experience. I had the poster for the semantic water quality portal and also helped a little bit with the RPI table in the academic section (by just being there). I went to several talks and visited some exhibit booths. We had a group lunch at Chevy’s, which was very fun!</p>
<p>Preparing the poster helped me to rethink the water quality portal. Thank Evan for his poster for ISWC 2011, which was a very good starting point for my AGU poster! Having the poster session was even more fun and rewarding! I presented the portal to researchers from various fields and countries. Most of the people I talked to said that the portal is a nice and interesting project. Some researchers gave me very helpful comments like:<br />
1. bring in crowd sourcing, e.g. let users report problem<br />
2. help farmers to identify polluted wells<br />
3. we should have an approach for pulling new data from USGS and EPA, e.g. some subscription<br />
4. regulation management for users (insert/upload/delete)<br />
5. consider allergic as use cases, possible conditions for allergic alert: wind + time, a combination of pollutants</p>
<p>I went to several talks during AGU and got to know the cool projects that researchers from different organizations (EPA, Standford, UMD, Google, NASA) have been doing. It was impressive to see that how computer science has been widely and deeply used in geophysical research. And I felt that scientists from geophysical fields expect more cooperation with people from computer science.</p>
<p>I went to the exhibits twice and spent quite some time there. I used the wired network provided by Google to do my assignments for the AI course. I also listened the talk about Google earth engine, a very cool platform for geophysical scientists!</p>
<p>Attending a conference as huge as AGU indeed requires some energy but after all this is worthwhile.</p>
<p>Tips about travel reimbursement that Carol gave me today:</p>
<p>1. Keep boarding pass to show if you sit in economic class<br />
2. check out at the hotel and get the folio to show that you actually<br />
stayed at the hotel for how many nights<br />
3. get itemized receipts at restaurants</p>
<p>Thank Carol!</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2012/02/01/get-comments-on-the-water-quality-portal-from-agu-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My First AGU Experience</title>
		<link>http://tw.rpi.edu/weblog/2012/01/30/my-first-agu-experience/</link>
		<comments>http://tw.rpi.edu/weblog/2012/01/30/my-first-agu-experience/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 15:14:39 +0000</pubDate>
		<dc:creator>Han Wang</dc:creator>
				<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1743</guid>
		<description><![CDATA[(This post was supposed to be posted a month ago. But I had some trouble accessing the TW weblog website when I was in China, so I have to post it now after I came back to Troy.) AGU 2011 Fall Meeting was the first time I went to an academic conference. I was very [...]]]></description>
			<content:encoded><![CDATA[<p>(This post was supposed to be posted a month ago. But I had some trouble accessing the TW weblog website when I was in China, so I have to post it now after I came back to Troy.)</p>
<p>AGU 2011 Fall Meeting was the first time I went to an academic conference. I was very excited when I learned I&#8217;ve got such an opportunity. My goals were to present our poster, to check out what it is like in such a conference, and to have an idea about what other people are doing in the Informatics area.</p>
<p>My poster was about the work with <a href="http://tw.rpi.edu/web/person/EricRozell">Eric Rozell</a> on the temporal metadata modeling in <a href="http://tw.rpi.edu/web/project/VSTO">VSTO</a>. I presented its motivation and methodology to several people, and it certainly drew some interest. Our approach has been viewed as an effective way to deal with a large amount of data and to improve reasoning and searching capacities. It was suggested that a similar technique (in the sense of including the temporal range for a dataset to a granularity of days using time:DateTimeInterval) has been used for data indexing in relational databases in NASA. In terms of the presentation, I think putting our posters, publications, and demos into flash drives and distributing them to people was a very good idea. It greatly helped the interested audiences to understand our work more afterwards.</p>
<p>There were many other interesting work across a couple of sessions. For example, <a href="http://trust.utep.edu/members/nick/">Nicholas Del Rios</a> etc. from University of Texas at El Paso presented a semantic and provenance aware visualization framework (<a href="http://trust.utep.edu/visko/">VisKo</a>) that links data with visualization processes. It has been used to visualize data on behalf of <a href="http://disc.sci.gsfc.nasa.gov/giovanni/overview/index.html">Giovanni</a>. It is able to capture data processing provenance and visualization provenance in PML. Besides posters, I also went several talks from different sessions. Though I failed to connect most of them to my research work, I thought it was nice to hear about what other people have been working on.</p>
<p>Another output for me was to meet people in the Earth Science and Informatics areas. Although the names I could remember were limited, what I saw was they are a group of people who show enthusiasm about their work. They believe in what they are doing and have the confidence in the accomplishment their work will bring. I really look forward to working with many of them.</p>
<p>To sum up, this was a great experience for me in the beginning stage of my Ph.D. career.  Next time I will try to meet and talk to more people, and get more feedback about my own work.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2012/01/30/my-first-agu-experience/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Greg</title>
		<link>http://tw.rpi.edu/weblog/2012/01/14/greg/</link>
		<comments>http://tw.rpi.edu/weblog/2012/01/14/greg/#comments</comments>
		<pubDate>Sat, 14 Jan 2012 19:04:59 +0000</pubDate>
		<dc:creator>Stephan Zednik</dc:creator>
				<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1746</guid>
		<description><![CDATA[It was with great sadness that I learned yesterday of the passing of my friend and colleague, Greg Leptoukh.  Greg was a physical scientist at the NASA Goddard Space Flight Center and a coordinator of the Earth Science Information Partners (ESIP) Federation Information Quality cluster.  Greg was dedicated to leveraging information systems to improve the [...]]]></description>
			<content:encoded><![CDATA[<p>It was with great sadness that I learned yesterday of the passing of my friend and colleague, Greg Leptoukh.  Greg was a physical scientist at the NASA Goddard Space Flight Center and a coordinator of the Earth Science Information Partners (ESIP) Federation Information Quality cluster.  Greg was dedicated to leveraging information systems to improve the usability of data for scientists; reducing technical barriers for data use and improving user comprehension of data generation and use.</p>
<p>I had the pleasure to work with Greg on the <a href="tw.rpi.edu/web/project/MDSA">Multi-Sensor Data Synergy Advisor</a> (MDSA) a prototype semantic extension to the already successful <a href="http://disc.sci.gsfc.nasa.gov/giovanni/overview/index.html">Giovanni online data anaylsis tool</a>.  Giovanni has proven to be a successful tool for reducing the technical barriers in science data processing, analysis, and visualization and information provided through Giovanni has played a role in over 400 science publications to date.  With MDSA, Greg intended to show how Giovanni could be instrumented to provide provenance, quality, and expert knowledge about data to interested users.  Greg was extremely enthusiastic about the potential of semantic technologies to power these enhancements; ontologies to describe concepts important to data generation and use and rules to expose and explain scenarios that may lead to misunderstood analysis results.  I will always admire Greg&#8217;s enthusiasm for what we had been able to accomplish, and what we would be able to accomplish in future projects.  Greg clearly saw what were were doing as a means to empower scientists, a noble goal if ever there was.</p>
<p>I am thankful for having had the opportunity to have worked with Greg, and incredibly sad he was not able to see the fruition of this work.</p>
<p>You will be missed friend.</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2012/01/14/greg/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Characterizing quality for science data products</title>
		<link>http://tw.rpi.edu/weblog/2011/12/30/characterizing-and-contrasting-quality-for-science-data-products/</link>
		<comments>http://tw.rpi.edu/weblog/2011/12/30/characterizing-and-contrasting-quality-for-science-data-products/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 21:43:40 +0000</pubDate>
		<dc:creator>Stephan Zednik</dc:creator>
				<category><![CDATA[tetherless world]]></category>
		<category><![CDATA[AGU]]></category>
		<category><![CDATA[product quality]]></category>
		<category><![CDATA[Tetherless World Constellation]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1713</guid>
		<description><![CDATA[Characterizing quality for a science data product is hard. We have been working on this issue in our Multi-Sensor Data Synergy Advisor (MDSA) project with Greg Leptoukh and Chris Lynnes from the NASA Goddard Space Flight Center (GSFC). The following is my opinion on what product quality means and how it can be characterized. This [...]]]></description>
			<content:encoded><![CDATA[<p>Characterizing quality for a science data product is hard. We have been working on this issue in our <a href="http://tw.rpi.edu/web/project/MDSA">Multi-Sensor Data Synergy Advisor (MDSA)</a> project with <a href="http://tw.rpi.edu/web/Person/GregLeptoukh">Greg Leptoukh</a> and Chris Lynnes from the <a href="http://www.nasa.gov/centers/goddard/home/index.html">NASA Goddard Space Flight Center (GSFC)</a>. The following is my opinion on what product quality means and how it can be characterized. This work was presented as a <a href="http://tw.rpi.edu/web/doc/AGUFM2011_IN21C-1438">poster</a> at the <a href="http://tw.rpi.edu/web/event/AGU/FM/2011">AGU FM 2011 meeting</a>.</p>
<p>Science product quality is hard to define, characterize, and act upon. Product quality reflects a comparison against standard products of a similiar kind, but it is also reflective of the fitness-for-use of the product for the end-user. Users weigh quality characteristics (e.g. accuracy, completeness, coverage, consistency, representativeness) based on their intended use for the data, and therefore quality of a product can be different based on different users&#8217; needs and interests.  Despite the subjective nature of quality assertions, and their sensitivity to users fitness-for-use, most quality information is provided by the product producer and the subjective criteria used to determine quality is opaque, if available at all.</p>
<p>If users are given product quality information at all, this information usually comes in one of two forms:</p>
<ul>
<li>tech reports where extensive statistical analysis is reported on very specific characteristics of the product</li>
<li>in the form of subjective and unexplained statements such as &#8216;good&#8217;, &#8216;marginal&#8217;, &#8216;bad&#8217;.</li>
</ul>
<p>This is either information overload that is not easy for the user to quickly assess or a near lack of the type of information that a user needs to make their own subjective quality assessment.</p>
<p>Is there a smilar scenario in common-day life where users are presented with quality information that they can readily understand and act upon?</p>
<p>There is, and you see it every day in the supermarket.</p>
<div class="wp-caption alignleft" style="width: 229px"><img class=" " style="margin-left: 10px;margin-right: 10px" src="http://tw.rpi.edu/images/nutritionIndex.gif" alt="" width="219" height="347" align="left" /><p class="wp-caption-text">a common application of information used to make subjective quality assessments</p></div>
<p>Nutrition Facts labels provide nutrition per serving information (e.g. amount of Total Fat, Total Carbohydrates, Protein) and how the the listed amounts per serving compare to a perspective daily diet.</p>
<p>The comparison to a standard 2,000 calorie diet provides the user with a simple assessment tool for the usefulness of food item in their unique diet. Quality assertions, such as whether this food is &#8216;good&#8217;, or &#8216;bad&#8217; for the consumer&#8217;s diet are left to the consumer &#8211; but are relatively easy to make with the available information.</p>
<p>A &#8216;quality facts&#8217; label for a scientific data product, showing computed values for community-recognized quality indicators, would go a long way towards enabling a nutrition label-like presentation of quality that is easy for science users to consume and act upon.</p>
<div class="wp-caption alignright" style="width: 260px"><img class=" " style="margin-left: 10px;margin-right: 10px" src="http://tw.rpi.edu/images/quality-facts.png" alt="" width="250" height="479" align="right" /><p class="wp-caption-text">an early mockup of a presentation of quality information for a science data product</p></div>
<p>We have begun working on mockups of what such a presentation of quality could look like, and have constructed a basic quality model that would allow us to express in RDF the information that would be used to construct a quality facts label.</p>
<p>Our <a href="http://tw.rpi.edu/web/project/MDSA/ProductQualityModelPrimer">quality model primer</a> presents our high-level quality model and its application to an aerosol satellite data product in detail.</p>
<p>Our poster presentation was a hit at AGU, where we received a great deal of positive feedback on it.  This nutrition label-like presentation is immediately familiar, and supports the metaphor of science users &#8216;shopping&#8217; for the best data product to fit their needs.</p>
<p>We still have a long way to go on developing our presentation, but the feedback from discussions at AGU tells me that our message resonated with our intended audience.</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2011/12/30/characterizing-and-contrasting-quality-for-science-data-products/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>S2S Feedback at AGU Fall Meeting 2011</title>
		<link>http://tw.rpi.edu/weblog/2011/12/19/s2s-feedback-at-agu-fall-meeting-2011/</link>
		<comments>http://tw.rpi.edu/weblog/2011/12/19/s2s-feedback-at-agu-fall-meeting-2011/#comments</comments>
		<pubDate>Mon, 19 Dec 2011 15:29:20 +0000</pubDate>
		<dc:creator>Eric Rozell</dc:creator>
				<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1703</guid>
		<description><![CDATA[The AGU Fall Meeting 2011 was a busy meeting and, as usual, the Tetherless World Constellation (TWC) received quite a bit of attention in terms of best practices and tool support for Semantic eScience. I gave two poster presentations during the Semantic, Linked Data, and Drupal-based Solutions for Science (IN31B) poster session. I had one [...]]]></description>
			<content:encoded><![CDATA[<p>The AGU Fall Meeting 2011 was a busy meeting and, as usual, the Tetherless World Constellation (TWC) received quite a bit of attention in terms of best practices and tool support for <a href="http://tw.rpi.edu/web/ResearchAreas/SemanticeScience" title="Semantic eScience">Semantic eScience</a>.  I gave two poster presentations during the Semantic, Linked Data, and Drupal-based Solutions for Science (IN31B) poster session.  I had one poster in IN31B about creating linked data for AGU abstracts.  The second poster was in IN31A, a session about the Real Use of Open Standards and Technologies, however it became apparent that I was more interested in talking about it as an IN31B poster.  It was a poster on S2S, and there was a range of feedback, which I discuss in this blog, including enthusiasts who wanted to implement it, skeptics who felt it was not an &#8220;interoperable&#8221; solution, and faceted browse developers who wanted to know why S2S needed so much complexity.</p>
<p>Addressing the first type of feedback is not difficult.  I want everyone to be able to deploy an S2S interface for their data.  However, I often have to hold myself back, because I know that the software is not to a point that it can be easily reused without a significant amount of hand-holding on my part.  The basic problems are documentation and complexity of installation.  While the documentation problem can be easily fixed, the problem of installation will remain until the S2S back-end architecture is updated.  The back-end architecture depends on a <a href="http://en.wikipedia.org/wiki/Triplestore" title="Triplestore">triplestore</a> deployed on one of TWC&#8217;s machines for indexing metadata about S2S services.  I plan to move the back-end to a linked data crawler approach next spring, removing the dependencies on TWC triplestores and enabling wider installation.</p>
<p>The second type of feedback was more interesting to address.  It&#8217;s always good to hear constructive criticism about a project.  The argument was, because S2S uses its own vocabulary to describe, i.a., Web services, &#8220;widgets&#8221;, and parameters, it is not interoperable because existing tools will not understand those vocabularies.  I have two primary defenses to that.  The first is that S2S allows you to define virtually any term so that they can be used by old tools and new tools.  For instance, S2S allows you to define each of the <a href="http://opensearch.org" title="OpenSearch">OpenSearch</a> vocabulary terms including &#8220;results&#8221;, &#8220;searchTerms&#8221;, &#8220;startIndex&#8221;, and &#8220;count&#8221;.  Each of these have in fact been implemented by our OpenSearch services for S2S, so when a traditional OpenSearch tool finds an S2S OpenSearch service, it should still be able to use it.  The second defense is, if you do not agree with the S2S vocabulary, find a vocabulary with as much tool support as S2S for developing faceted browse or advanced search interfaces.  At the time the S2S project started, we found no vocabularies for defining the &#8220;extensibility&#8221; aspects of OpenSearch (i.e., the fact that URIs can be used in place of any of the OpenSearch terms).  So we did define those vocabularies, and we specifically designed them for S2S&#8217;s purpose.  I&#8217;d be happy to collaborate with anyone who has a broader or different purpose from S2S to extend the vocabulary to their needs, or map S2S terms to their terms.</p>
<p>The last type of feedback was why the S2S framework has so much complexity.  I&#8217;m not sure there is one good response to that inquiry, I think the complexity is useful when you look at the big picture for S2S.  For one, S2S was never explicitly designed to be a framework for faceted browsing interfaces.  Rather, it was designed to develop configurable user interfaces, with a heavy emphasis on reusability of user interface components.  Faceted browsing became the focus because we had two use cases that were best implemented with faceted browse.  Another complexity issue was in the number of queries made by an S2S faceted browser compared to something like <a href="http://lucene.apache.org/solr" title="Apache Solr">Apache Solr</a>.  For instance, a browser with 6 facets could potentially require 7 queries to populate the browser with data in S2S (1 per facet plus 1 for the results).  In Solr, there is a single query that can return all facets and facet values.  The design decision in S2S was that a data manager may need to query a remote source to determine what its facet values are.  Alternatively, the data manager may have a single input that it does not wish to facet (say, for performance reasons).  In either case, we designed S2S to be as flexible as possible, which in some cases means it takes a little more effort to set up when compared to something more rigid, such as Apache Solr.</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2011/12/19/s2s-feedback-at-agu-fall-meeting-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is Data Publication the right metaphor?</title>
		<link>http://tw.rpi.edu/weblog/2011/12/15/is-data-publication-the-right-metaphor/</link>
		<comments>http://tw.rpi.edu/weblog/2011/12/15/is-data-publication-the-right-metaphor/#comments</comments>
		<pubDate>Thu, 15 Dec 2011 17:43:23 +0000</pubDate>
		<dc:creator>pfox</dc:creator>
				<category><![CDATA[tetherless world]]></category>
		<category><![CDATA[citation]]></category>
		<category><![CDATA[data release]]></category>
		<category><![CDATA[metaphors]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1700</guid>
		<description><![CDATA[http://mp-datamatters.blogspot.com/2011/12/seeking-open-review-of-provocative-data.html]]></description>
			<content:encoded><![CDATA[<p>http://mp-datamatters.blogspot.com/2011/12/seeking-open-review-of-provocative-data.html</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2011/12/15/is-data-publication-the-right-metaphor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TWC Undergrads Visualize Linked Open Corporate Data</title>
		<link>http://tw.rpi.edu/weblog/2011/12/01/twc-undergrads-visualize-linked-open-corporate-data/</link>
		<comments>http://tw.rpi.edu/weblog/2011/12/01/twc-undergrads-visualize-linked-open-corporate-data/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 14:08:17 +0000</pubDate>
		<dc:creator>olyerickson</dc:creator>
				<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1692</guid>
		<description><![CDATA[Two undergraduate members of the Tetherless World team, Alexei Bulazel and Bharath Santosh recently wrote great summaries of their work creating visualizations based on linked open corporate data aggregated through the ORGPedia project. In this post I&#8217;ll include snippets from their posts; I encourage you to check out their full posts and the demos they [...]]]></description>
			<content:encoded><![CDATA[<p>Two undergraduate members of the Tetherless World team, <a href="http://bit.ly/sYylsQ">Alexei Bulazel</a> and <a href="http://bit.ly/uF9t5I">Bharath Santosh</a> recently wrote great summaries of their work creating visualizations based on linked open corporate data aggregated through the <a href="http://dotank.nyls.edu/orgpedia/">ORGPedia project</a>. In this post I&#8217;ll include snippets from their posts; I encourage you to check out their full posts and the demos they link to!</p>
<p>First, a bit of context (from the ORGPedia site): </p>
<blockquote><p><a href="http://dotank.nyls.edu/orgpedia/">ORGPedia: The Open Organizational Data Project</a>, led by NYLS Professor and former United State Deputy CTO <a href="http://bit.ly/uNTIlU" title="Beth Noveck">Beth Noveck</a> (Project Lead) and TWC Senior Constellation Professor <a href="http://bit.ly/tKJZzg" title="Jim Hendler">Jim Hendler</a> (Tech Lead) explores how to create the legal, policy and technology framework for a data exchange to facilitate efficient comparison of organizational data across regulatory schemes as well as public reuse and annotation of that data. By designing a universal exchange rather than a new numbering scheme, OrgPedia aims to achieve goals like improving corporate transparency and efficiency, organizational performance, risk management, and data-driven regulatory policy–without having to wait until legislation is enacted for a single, legal entity identifier.</p></blockquote>
<p>To date, TWC&#8217;s contribution to ORGPedia has been to aggregate data from a variety of sources, develop an <a href="http://tw.rpi.edu/orgpedia/">experimental site</a> to serve as a platform for integrating the data and prototyping ORGPedia concepts, and develop data visualizations and mashups that demonstrate the potential of an open system of canonical identifiers for corporate entities. Led by TWC Ph.D. student <a href="http://bit.ly/w0ip9g">Xian Li</a>, undergrads  <a href="http://bit.ly/sYylsQ">Alexei Bulazel</a> and <a href="http://bit.ly/uF9t5I">Bharath Santosh</a> teamed together to create interesting visualizations based on the data aggregated. </p>
<p><a href="http://bit.ly/uF9t5I">Bharath</a> first describes a visualization he created that allows users to analyze various financial properties of the <a href="http://bit.ly/uD1u9w">financial sectors in the US</a> using our aggregated data: </p>
<blockquote><p>
<a href="http://bit.ly/uF9t5I"><img alt="" src="http://bit.ly/ukFa0M" title="financial properties of the financial sectors in the US" class="alignleft" width="400" height="352" /></a>The visualization itself is through Google Motion Charts which is in Google’s Visualization API. It is an interactive multidimensional  graph of a dataset of sectors and the mean of various financial properties across the sector’s companies. The data shown above is represented is represented in millions USD. The Motion Chart allows for really neat temporal analysis of data in various forms. Clicking the play button shows the change in properties from 2008 to 2011. There are also three different styles you can view the data: bubbles(shown above), bar charts, line graphs. These can be switched in the top corner.</p>
<p>The dataset behind the visualization was created in R. I made a sparql query that would access Orgpedia’s datasets and pull out sector of the US and the companies and their stock tickers within the sectors. Then I took these companies and pulled in their income statements from Google Finance and went through each sector and averaged various properties from the sector’s companies’ financial statements. The data manipulation in R took some getting used to, but now its very easy for me to transform data frames, matrices, and other objects in R. After the dataset was created and cleaned for non-existent values its just defining properties of the Motion Chart and running it. It generates a html file with the graph and data represented in javascript. All the data processing and manipulation takes around 15 minutes mostly due to the large amount of data to be downloaded.
</p></blockquote>
<p>Bharath then goes on to describe the compelling visualization he and <a href="http://bit.ly/sYylsQ">Alexei</a> created of the &#8220;social network&#8221; of corporate board members: </p>
<blockquote><p>
&#8230;The visualization utilizes data from <a href="LittleSis.org">LittleSis.org</a> and gathers data about board members of various companies in the US and shows the members in a force graph that shows which board members are on multiple boards (Board Members Network):</p>
<p>The graph visualization is done using the D3 visualization toolkit’s Forced Graph. Each node represents a board member. The clustered colored nodes are a group of members on the same board. The multicolored nodes represent board members that are on multiple boards. Mousing over a node shows you their name and the companies they work for. Clicking a node takes you to their LittleSis.org page. The graph shows many interesting relationships between various companies and board members. Especially Steven S Reinemund who resides on 5 different boards.
</p></blockquote>
<p>On his blog, <a href="http://bit.ly/sYylsQ">Alexei</a> provides additional detail about the work they did to prepare the data for the visualization: </p>
<blockquote><p>
<a href="http://bit.ly/uF9t5I"><img alt="" src="http://bit.ly/sUJIyE" title="corporate board members social network" class="alignleft" width="400" height="316" /></a>The project involved creating an interactive graph visualization of connections between members of corporate boards (the final product can be found here). Given a list of a few hundred stock tickers and access to the LittleSis API, the goal was to ultimately produce a JSON file of board members that could be use by the D3.js force-directed graph framework. I started by looking up each ticker symbol, yielding a JSON file with a unique ID number for each company. My script then queried the API for actual company page associated with that ID and stored the names, company associations, and URIs of each board member. Finally, a JSON file for the D3.js graph was output describing the ~2800 board members and the links between each of them.</p>
<p>While I had used Python a bit for command line scripting, I hadn’t really dug into it before this project. The work gave me a better taste for the language and its capabilities. I made extensive use of the “urllib” library for accessing web content, and worked with opening up the data in JSON files. Bharath helped me with the syntax of program and some of the graph construction. While I was aware of Python’s reputation for ease of use and high level abstraction, working with it let me experience this abstraction first hand, I was very impressed. The ease with which complex multistep operations could be completed let me focus more on the flow of the data through the process rather than the specifics of handling it. The project also gave me a bit more hands on experience with JSON.
</p></blockquote>
<p>The reader is encouraged to read both <a href="http://bit.ly/sYylsQ">Alexei</a>&#8216;s and <a href="http://bit.ly/uF9t5I">Bharath</a>&#8216;s blogs for more details on these great contributions by a couple of our TWC undergrads!</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2011/12/01/twc-undergrads-visualize-linked-open-corporate-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two Misconceptions about the Semantic Web</title>
		<link>http://tw.rpi.edu/weblog/2011/11/18/two-misconceptions-about-the-semantic-web/</link>
		<comments>http://tw.rpi.edu/weblog/2011/11/18/two-misconceptions-about-the-semantic-web/#comments</comments>
		<pubDate>Sat, 19 Nov 2011 02:40:07 +0000</pubDate>
		<dc:creator>Jesse Weaver</dc:creator>
				<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1679</guid>
		<description><![CDATA[I recently presented at the Semantic Graph Database Processing BOF at SC2011, and I had the opportunity to discuss with others the needs for high-performance computing in web-scale computation and the benefits of Linked Data and ontologies on the World Wide Web. There was one participant there who was adamantly opposed to the semantic web.  [...]]]></description>
			<content:encoded><![CDATA[<p>I recently presented at the <a href="http://sc11.supercomputing.org/schedule/event_detail.php?evid=bof149">Semantic Graph Database Processing BOF at SC2011</a>, and I had the opportunity to discuss with others the needs for high-performance computing in web-scale computation and the benefits of Linked Data and ontologies on the World Wide Web. There was one participant there who was adamantly opposed to the semantic web.  (I think his exact quotes outside of the presentation were something like &#8220;I do not believe in the semantic web&#8221; and &#8220;only the semantic web cares about the semantic web&#8221;).  As I tried to make my case with him, it became increasingly clear to me that this person had a few misconceptions about the semantic web. I want to address those misconceptions here.</p>
<p>Before I continue, though, allow me to disclaim a bit. I am not a representative of the entire semantic web community, although I do consider myself a member of it. Additionally, I am not officially associated with the <a href="http://www.w3.org/">W3C</a>. I write this blog entry simply in the capacity of a semantic web enthusiast (henceforth, semwebber), and not even as a member of the <a href="http://tw.rpi.edu/">Tetherless World Constellation</a>. I invite, nay, urge other semwebbers to contribute comments to this blog post in any capacity (agree, disagree, amend, etc.).</p>
<p><strong>1. &#8220;One ontology to rule them all&#8221;</strong></p>
<p>To my knowledge, nobody has ever claimed that there should be &#8220;one ontology to rule them all.&#8221; Instead, what is regularly promoted is ontology reuse and/or integration. For example, the <a href="http://xmlns.com/foaf/0.1/">FOAF ontology</a> is widely used in the semantic web to describe persons; why create your own ontology when you can reuse a well-established one? Integration of ontologies allows for conciliation of perspectives, causing data that use these ontologies to become meaningfully related. Admittedly, there are some rather large, comprehensive ontologies out there, and there are some very popular and pervasive ones, too. However, there is no standard or recommendation that requires publishers of RDF data to comply with any particular ontology. You could even ignore the RDF vocabulary if you so please (yes, even rdf:type).</p>
<p>The primary purpose of an ontology (in my view) is to attach explicit semantics to your data. Just as the participant had stated (although he meant it in contrast to the semantic web), there are many ontologies. They compete in the ecosystem of the World Wide Web and evolve accordingly (or become extinct).</p>
<p><strong>2. &#8220;Triples all the way down&#8221;</strong></p>
<p>(First, let me say, this is not an affront to <a href="http://planetrdf.com/">Planet RDF</a>.)</p>
<p>This is a bit of a pet peeve of mine, and perhaps what I say here will offend some semwebbers (I hope not). The semantic web (in my view) is not about &#8220;triples all the way down.&#8221; What do I mean by that? Let me explain.</p>
<p>RDF brings primarily two things to the table when it comes to publishing and integrating data on the web: names in the form of URIs, and a simple data model that is flexible enough for (arguably) nearly any kind of data. (I would like to add a third, meaningful links, but I will avoid that for now.) So when data is published to the web, publishing it as RDF allows you: (1) to identify the things in your data across the <em>World Wide</em> Web, and (2) to structurally (and possibly semantically) integrate your data with other data on the <em>World Wide</em> Web. (I emphasize &#8220;World Wide&#8221; here to bring to attention the vast scope of publication, identification, and integration that is being achieved.) Fantastic.</p>
<p>Does this mean that everything can be efficiently (or rather, ideally) represented in RDF? No. Then why would you ever want to handle triples? You probably don&#8217;t. Let me explain.</p>
<p>RDF is meant to solve the problem of meaningfully publishing data (not just documents) on the <em>World Wide</em> Web. Beyond that, do what you want. More specifically, when you crawl and/or aggregate data from the <em>World Wide</em> Web, you don&#8217;t have to keep the RDF data as triples in your system. It is no longer on the global stage of the <em>World Wide</em> Web; rather, it is now in your system where you are king. So optimize away! Store it or process it however you like! Relational databases? Sure! Rewrite URIs as shorter terms? Whatever floats your boat! Ignore the explicit semantics and treat it like an unlabeled graph? I wouldn&#8217;t recommend it, but you&#8217;re the king! Do whatever it takes to meet your use case, and if your use case has something to do with RDF data, then fine, leave it as triples if you want. My point is, it&#8217;s not <em>necessarily</em> &#8220;RDF all the way down,&#8221; but it is &#8220;RDF at the top&#8221; where &#8220;top&#8221; is the place of publication, the <em>World Wide</em> Web. The universal naming mechanism of URIs and the generic data model enables data publishers to get data out there in a way that can be <em>explicitly</em> understood by machines (for example, when I say &#8220;Beast is furry,&#8221; am I talking about <a href="https://www.facebook.com/beast.the.dog">Mark Zuckerberg&#8217;s dog</a> or <a href="http://en.wikipedia.org/wiki/Beast_%28comics%29">the fictional X-Man Dr. Henry Philip &#8220;Hank&#8221; McCoy</a>?), but as the creator of that machine, it&#8217;s up to you how to utilize those explicit semantics.</p>
<p><img src="https://fbcdn-sphotos-a.akamaihd.net/hphotos-ak-ash4/307630_289732801053580_202825023077692_1217568_1277133049_n.jpg" alt="Beast, Mark Zuckerberg's Dog" height="200" /><img src="http://upload.wikimedia.org/wikipedia/en/a/ad/Beastastonishing.jpg" alt="Beast, the fictional X-Man" height="200" /> (They both look furry to me.)</p>
<p>To be clear, though, I am promoting RDF as a way to publish structured, semantic data as opposed to <strong>not</strong> publishing structured, semantic data.  In the future, it is conceivable that there may exist other good ways to publish structured, semantic data, but RDF exists today and is widely used.</p>
<p>So I will leave it at that. Again, I invite comments, rebuttals, accolades, disparagements, etc.<br />
<a href="http://www.cs.rpi.edu/%7Eweavej3/index.xhtml"><br />
Jesse Weaver</a></p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2011/11/18/two-misconceptions-about-the-semantic-web/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Biomedical Semantics and the Cloud</title>
		<link>http://tw.rpi.edu/weblog/2011/11/18/biomedical-semantics-and-the-cloud/</link>
		<comments>http://tw.rpi.edu/weblog/2011/11/18/biomedical-semantics-and-the-cloud/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 21:04:18 +0000</pubDate>
		<dc:creator>Jim McCusker</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1673</guid>
		<description><![CDATA[I&#8217;ve been asked to give a 30 minute talk on biomedical semantics in the cloud at the Molecular Med Tri Con in the symposium on cloud computing. Here&#8217;s what I know about what&#8217;s going on in this area at the moment: LexEVS is available as a AMI: https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/Cloud_Computing_Services A Virtuoso Universal Server is available as an AMI: http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtInstallationEC2 [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been asked to give a 30 minute talk on biomedical semantics in the cloud at the <a href="http://www.triconference.com/cloud-computing/">Molecular Med Tri Con</a> in the symposium on cloud computing. Here&#8217;s what I know about what&#8217;s going on in this area at the moment:</p>
<ul>
<li>LexEVS is available as a AMI: <a href="https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/Cloud_Computing_Services">https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/Cloud_Computing_Services</a></li>
<li>A Virtuoso Universal Server is available as an AMI: <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtInstallationEC2">http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtInstallationEC2</a></li>
<li>I&#8217;m pretty sure NCBO is doing something here, but I can&#8217;t find any specifics.</li>
<li>Will Stardog be available as an AMI or similar?</li>
</ul>
<p>So that&#8217;s on the &#8220;semantics using the cloud&#8221; side, but I really think that there&#8217;s a lot of potential going the other way: using semantics to discover data and services in the cloud. <a href="http://sadiframework.org">SADI</a> has the ability to discover and link services through ontologies. It&#8217;s similar to SAWSDL (in fact, they wrap SAWSDL services), but they don&#8217;t bother with the extra layer, and just let the service process RDF directly. When SADI services are deployed to the cloud, it&#8217;ll solve a big problem for people who want others to use their services/algorithms without the overhead of maintaining those servers themselves. In fact, with the <a href="http://aws.amazon.com/devpay/">Amazon DevPay</a> structure, it&#8217;s possible for small labs to release datasets, databases, and algorithms to the world and not have to pay to support it.</p>
<p>I say when, not if, because my implementation of SADI in Python is almost ready for deployment through Google App Engine (which can be deployed in AWS or other systems using <a href="http://code.google.com/p/appscale/">AppScale</a>), and from what I hear, it won&#8217;t take much work to do the same with the Java implementation. Between this and the extreme portability of python SADI services (it&#8217;s just a script), use in the cloud and redeployment to private clouds is going to be trivial.</p>
<p>So I&#8217;m asking folks, am I full of it? Also, what else is there out there? Please help me out so that we all get some good exposure!</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2011/11/18/biomedical-semantics-and-the-cloud/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>My report on Open Government Data camp 2011</title>
		<link>http://tw.rpi.edu/weblog/2011/11/02/my-report-on-open-government-data-camp-2011/</link>
		<comments>http://tw.rpi.edu/weblog/2011/11/02/my-report-on-open-government-data-camp-2011/#comments</comments>
		<pubDate>Wed, 02 Nov 2011 06:37:35 +0000</pubDate>
		<dc:creator>agraves</dc:creator>
				<category><![CDATA[linked data]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[personal ramblings]]></category>
		<category><![CDATA[tetherless world]]></category>

		<guid isPermaLink="false">http://tw.rpi.edu/weblog/?p=1649</guid>
		<description><![CDATA[A few days ago I (Alvaro Graves) participated in the Open Government Data Camp 2011 in Warsaw, Poland, where people from different groups, organizations and governments met to discuss issues related to Open Data at government level. Here are some of the most important issues found in theese talk, in my opinion. The current state [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago I (<a href="http://tw.rpi.edu/instances/AlvaroGraves">Alvaro Graves</a>) participated in the <a href="http://ogdcamp.org/">Open Government Data Camp 2011</a> in Warsaw, Poland, where people from different groups, organizations and governments met to discuss issues related to Open Data at government level. Here are some of the most important issues found in theese talk, in my opinion.</p>
<h2>The current state of OGD</h2>
<p><a href="http://eaves.ca">David Eaves</a>, an activist who advises the city of Vancouver, Canada in issues about Open Data, gave a keynote in which he described his views on the current state of Open Data movement. First, it is striking that the success stories are not just a few anymore (as <a href="http://data.gov">Data.gov</a> or <a href="http://data.gov.uk">Data.gov.uk</a>) but there are dozens (perhaps hundreds), both at national, regional and local levels. Similarly, the term Open Government Data is becoming increasingly popular, which is good because it is easier to stop explaining the &#8216;what&#8217; and start focusing in the &#8216;how&#8217;.</p>
<p>Another interesting point is how the movement of Open Government Data already passed an inflection point, where it is no longer seen as people demanding from the outside, but being increasingly being invited to help working on these initiatives from within the government. For many, this change in perspective can be confusing and may create some concerns of Open Data being absorbed in a bureaucratic system that makes impossible to implement Open Data initiatives. However, it is clear that in order for these changes to occur, the movement can not reject to collaborate with governments.</p>
<h2>Local initiatives, by locals</h2>
<p>A talk that I really liked was by <a href="http://www.zylstra.org/">Ton Zylstra</a>, who lives in the city of Enschede, the Netherlands. This city has only 150,000 inhabitants. He wanted an Open Data initiative there, however, it was difficult to convince the authorities, so he with a group of people decided to start working on their own. Inviting a handful of hackers to a bar, they created their first application that used data from Twitter, Foursquare, and the venues of a local festival. Eventually they convinced the municipal government that the default option for local data ought to be open.</p>
<p>From this experience, Ton showed several important lessons: You have to create something concrete, no matter if it is small: This implies something that requires little funding (the first beers at the bar were free) and short-term (no more than a couple of weeks). It does not matter if it is something original or not, there are some great ideas out there that deserve to be copied and are very useful for the local community.</p>
<h2>How the Open Data died</h2>
<p>Another very interesting keynote was by <a href="http://countculture.wordpress.com/">Chris Taggart</a>, founder of <a href="http://opencorporates.com/">OpenCorporates</a>, who warned of the risks that the Open Data movement is facing today. His main concern is the lack of relevance in terms of impact Open Data has on society. For example, he mentioned that so far no one&#8217;s business depends on Open Data (although this is not true, there are a few out there, but I have to concede they are rare examples). In general, making data available is not enough, it is necessary for it to be used either in applications, by data journalists, etc. Also, it is fundamental to link different sites with Open Data (something quite uncommon in the movement), so that people can find out more information. Finally, I liked his idea that if the Open Data does not cause problems to its incumbents, then it is not working.</p>
<h2>Redefining what is public</h2>
<p>Finally another talk that I found interesting was the idea of ​​Dave Rasiej, founder of <a href="http://personaldemocracy.com/about-us">Personal Democracy</a>, and <a href="http://users.ecs.soton.ac.uk/nrs/">Nigel Shaldbolt</a>, professor at University of Southampton, to redefine &#8220;the public&#8221; in terms of data that &#8220;is available on the Web in machine-processable formats.&#8221; That is, uploading a bunch of PDFs with scanned tables does not make that information public, because it is not easily accessible. This initiative raises the bar of what public data is, especially when compared to the <a href="http://en.wikipedia.org/wiki/Freedom_of_Information_Act_(United_States)">FOIA (Freedom of Information Act)</a> that allows you to request information from government. Note that this applies to all information, as Rasiej so <a href="http://twitter.com/jedmiller/status/127369701549547520">vehemently described it</a>.</p>
<h2>So&#8230; what did you talked about at OGDCamp?</h2>
<p>In my case, I presented a system for publishing Linked Data called <a href="http://lodspeakr.org">LODSPeaKr</a>, which can be used for the rapid publication of government data and to create applications based on Linked Data. In the near future I will be writing more about this framework, but for now you can see <a href="http://www.slideshare.net/alangrafu/publishing-linked-open-data-in-15-minutes">my presentation here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://tw.rpi.edu/weblog/2011/11/02/my-report-on-open-government-data-camp-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

