Possible Undergrad Research Projects

Fall 2011 Semester

  • Please use the template below when adding new projects...
  • Please specify sponsoring project (e.g. "LOGD" or "Other" of not known)



0. Project Name (sponsoring project)

Contact

  • Project Contact

Description

  • Project Description

Required Skills

  • If Any



1. Update, complete, and add new features for RPI map (Other)

Contact

Description

  • The RPI map project was begun by Jin Guang Zheng zhengj3 at rpi dot edu
  • Need the ability to link to information about the buildings
  • Need to integrate this information with information that we have in our TW knowledge base
  • Make the endpoint accessible outside of rpi map
  • integrate rpi map with Drupal, not just media wiki, or in a simple web page

Required Skills (if any)




2. Spicey Ideas Mobile App (Other)

Contact

  • Peter Fox

Description

  • Need standalone apps for the major devices, Android, iStuff, etc.
  • Will operate independently of the website.
  • Need to essentially mimic the website (www.spiceyideas.com)--similar structure, look and feel, but won't have a shopping cart.
  • Will have product descriptions and info, and just link the user to the website for purchase.
  • The image pointers, food lists and spice lists will be in a device-resident backend database--pretty small, currently 71KB as a sqlite dbase.
    • The db is 9 related tables. The largest is the food list, currently with 139 records. The spice list currently has 62 records. The others are smaller.

Required Skills (if any)




3. Mobile Recommendation Systems as exemplified in the Wine Agent (Other)

Contact

  • Evan Patton, Deborah McGuinness

Description

  • Work on the mobile versions (android, apple) of a semantic recommendation application for wine.

We hope to get this deployed this term so that we can experiment with how social aspects of application use and data collection come into play. This is one of the longest lived demos in the lab and has spanned numerous research topics as well as interesting application work. Work on all aspects of the demo as well as foundational work is welcome.
see http://tw.rpi.edu/web/project/SemantAQUA for more info

Required Skills (if any)

If you have mobile phone skills, we are thrilled but as long as you are excited about learning we are happy for help. OWL knowledge is a plus for certain aspects of the work but is not required for others.





4. SemantAqua (Other)

Contact

  • Evan Patton, Ping Wang, Deborah McGuinness

Description

  • Work on an enhanced version of what was originally a water quality portal. This is evolving to a general environmental portal. We would also like to enable citizen scientists to participate by providing data as well as evaluating data.

see http://tw.rpi.edu/web/project/Wineagent for more info

Required Skills (if any)





5. Semantic Search (NS-CTA)

Here is what I think might be interesting for an undergraduate project:

Contact

  • Jie Bao

Description

The NS-CTA project has developed a semantic information theory for
measuring the quantity of "semantics", or knowledge, in a sentence. In
a setting where no formal logical representation of a sentence is
available, working approximations are needed.

The proposed work is targeted at applying the semantic information
theory in information retrieval from natural language texts, by
modeling keyword-based search relevance as mutual semantic information
between the search terms and the documents. A semantically extended
TF-IDF measure will be used as an approximation of semantic mutual
information.

The output will be a demo prototype for semantic search.

Required Skills (if any)

Required

  • Experience with coding in any programming language
  • Passion to do research, not just implementation. You will work with other researchers on designing algorithms and finding/reading a few papers.

Preferred but not required

  • basic knowledge on probabilistic theory, information theory is a plus
  • experience with natural language processing tools, e.g., GATE
  • basic knowledge about search and information retrieval (e.g., PageRank, TF-IDF)





6. Health on the Web (Other)

Contact

  • Deborah McGuinness, Joanne Luciano

Description

  • We want to get some demos up on what we can do with health demos on the web. All ideas welcome. Some around personal health are particularly welcome. Some ideas are to take healthscore to the next step, move a popscigrid style demo to something more. We have started to use the same style to represent information about physical education requirements in geographic areas and the obesity rate. We also have some data related to nutrition rules (such as not selling cola in schools) and obesity. These are just a few examples. http://tw.rpi.edu/web/project/HealthOnTheWeb for more info

Required Skills (if any)





7. Context Mining and Modeling from Mobile Devices (Other)

Contact

  • Possible collaboration with Samsung, Qualcomm or Pitney Bowes ? (ping Li, Shangguan or Jie and keep Deborah informed of any project on this path)

Description

Understanding user's situation and intention from their contexts is critical in the "tetherless" world of mobile networks. This is now an important research topic for many device maker and service providers (including the above mentioned companies, others also include RIM, Huawei etc)

Potential work include

  • What is the ontology needed to model mobile user's contexts and situation?
  • Mining context information from user's historical and sensor data.
  • Applying semantic rule language (e.g. SPARQL or RIF or logic program) in context inference

Output

  • An android or IPhone app, demonstrating the usefulness of context-awareness

Also see

  • Tasker or AutomateIt on Android Market

Required Skills

Not really required at the beginning, but the candidate should be eager to learn

  • basic semantic web notions: ontologies, RDF, rules
  • basic machine learning notions and tooks: naive Bayes, Weka, etc
  • Webkit development





8. FUSE (IARPA)

Contact

  • Deborah McGuinness, Jim Hendler, Joanne Luciano

Description

  • Big new IARPA project on Foresight and Understanding on Scientific Exploitation.

Many different kinds of projects but initial projects will center around getting some quick demos up from some sample text markups using our google visualization and converter tools in logd. These projects will focus on showing both the potential of what we can do with a linked data approach and with maintaining and exposing provenance.

The project focuses on finding emergence of topics. Our focus is on deciding what is emerging and encoding and presenting whey we think something is emerging.

See http://www.iarpa.gov/solicitations_fuse.html for more information.

Required Skills (if any)





10. Population Sciences Grid (NSF)

Contact

  • Jim McCusker, Deborah McGuinness

Description

  • We provided a demonstration of how we can help policy makers look at health data by taking information about smoking prevalence and tobacco taxes and smoking bans.

We would like to expand this demo or use the foundation for similar demonstrations in other health domains.

See http://tw.rpi.edu/web/project/PopSciGrid for more information.

Required Skills (if any)





11. Converter Portal (LOGD, Other)

https://github.com/timrdf/csv2rdf4lod-automation/wiki

Contact

Description

  • Build a front-end for Tim's converter. This would be very big deal as, if done right, we would probably have several govt users in short order, lots of universities, and a way to promote our instance hub that would be high priority. The undergraduate would assist a graduate student in the development of this front-end.
  • Tasks
    • Familiarize with converter (on command line)
    • Analyze existing conversion parameters to identify most frequently used (developed in conjunction with a manual list from Tim)
    • Choose example dataset, convert on command line, enhance fully
    • Mock up UI use case and UI mechanisms.
    • Build enhancement recommender
    • Review conversion ontology
    • UI to build SPARQL test cases of converted results.

Required Skills

(not all required, but would be useful)

  • bash
  • svn
  • perhaps Java
  • web development - server and client side
  • web UI development
  • SPARQL
  • tomcat, J2EE
  • SADI
  • Google Refine += RDF export extension, Socrata, and other similar publish/convert tools.
  • Information modeling, UML, ERD, RDFS, OWL
  • true REST design (it's more than URLs to get stuff...)



12. ORGPedia

Contact

Description

You will explore the opportunities of approaching challenging problems in the domains of Finance and Economics by mashing up large amounts of data from different sources.

  • Tasks
    • generate RDFs for key financial data, such as financial statements (XBRL), insider transaction filings, corporate ownerships, or any other data you can gather, and linking them
    • build novel mashups about stocks, companies, management, investors, government agencies and so on with linked data technologies and analytical tools

Required Skills

  • familiar with Web programing such as PHP, Python; statistical languages such as R; knowledge with Drupal is preferred



13. AIR to RIF-PRD

Contact

Description

  • AIR is a web rule language. RIF is a standard format for interchanging rules described in different languages. We have defined AIR to RIF-PRD translation so that AIR rules can also be exchanged in RIF. We need help in implementing this translation that takes any AIR rules and gives its RIF representation. It is preferred that this be coded in python as rest of the code base (e.g. AIR reasoner) is in python.

Required Skills

  • Python

What can be gained

  • Initiation into logic-based programming paradigm. If you're already familiar, then chance to refine those.
  • Familiarization with standardization efforts in Semantic Web domain, especially conducted via W3C.



14. Temporal reasoning in AIR

Contact

Description

  • AIR is a web rule language. It accepts data in Notation 3 (N3) format. N3 extends the RDF format, which is the W3C recommended format for sharing data on the Web. Statements (or facts) can be conveniently time stamped (with time points or time intervals) in N3 format. We would like to build a temporal reasoning infrastructure on top of the AIR reasoning capabilities. And we are only looking at simple computations over intervals for temporal reasoning. It is preferred that this capability be built/coded in python as rest of the code base (e.g. AIR reasoner) is in python.

Required Skills

  • Python

What can be gained

  • Initiation into logic-based programming paradigm. If you're already familiar, then chance to refine those.
  • Initiation into temporal reasoning.



15. Algorithms for modeling social network data (Other)

Contact

Description

  • implementing algorithms for extracting latent topics from social network streams

Required Skills

  • C++/python



16. S2S Widget Development (SeSF)

Contact

Description

  • Some widgets are out of date. Some, more intricate widgets, are not yet completed (i.e., bounding box widget, OpenSearch results widget, etc.).
  • Opportunities to hack on the S2S UI core if interested.
  • Has a lot of return for a little effort, most widgets can be developed in an afternoon!
  • Good chance to learn about application frameworks, science data services, etc.

Required Skills

  • JavaScript



17. S2S SAWSDL Extension (SeSF)

Contact

Description

  • The S2S architecture is capable of extension beyond OpenSearch. We are interested in developing an "adapter" for SAWSDL services.
  • Opportunity to hack on the S2S Server core if interested.
  • Good chance to learn about Semantic Web services, SPARQL, etc.

Required Skills

  • PHP



18. Linked BioPAX Web Service

Contact

Description

The BioPAX format is a widely adopted application of OWL/RDF to express biological regulatory networks. Currently, it is used in a non-linked manner, where identifiers of genes, proteins, etc. are only used as identifiers, but there is no relation to actual URIs for those entities. LinkedBioPAX is an attempt at showing how BioPAX can be adjusted to use Linked Data URIs for biological entities. The work mostly consists of finding URI templates to implement, and implementing them. The current source is here and the deployed service is here.

Required Skills

  • Python



19. Extending the SADI Python Implementation

Contact

Description

The SADI Semantic Web Services Framework now has a Python implementation, but it could use some work. We would like to enable use in mod_python, Google App Engine, enable asynchronous services, and also allow for data file attachements. This is an extremely simple code base and offers the ability to learn semantic web and web services concepts in a very easy-to-use environment.

Required Skills

  • Python




20. Converter Unit Testing

https://github.com/timrdf/csv2rdf4lod-automation/wiki

Contact

Description

Unit tests for conversions can be used to catch bugs before they are found in applications and demonstrations. Unit tests can be embodied by SPARQL queries. Tim already has some design and automation to test conversions, but it is incomplete and underutilized.

Required Skills

  • bash
  • SPARQL
  • svn
  • TDB, Virtuoso




21. Converter Bug Fixing

https://github.com/timrdf/csv2rdf4lod-automation/wiki

Contact

Description

144 issues that need to be clarified, prioritized, and fixed. https://github.com/timrdf/csv2rdf4lod-automation/issues

Required Skills

(not all required)

  • bash
  • SPARQL
  • svn
  • TDB, Virtuoso
  • Java




22. Converter dataset listing pages

https://github.com/timrdf/csv2rdf4lod-automation/wiki

Contact

Description

There is a lot of metadata for datasets, and datasets contain a lot of data. How best to brag about that on a single web page devoted to each dataset? This would involve writing code to write web pages by SPARQL querying an endpoint and showing the results on the HTML. Exploring and determining what queries to build would be part of the problem to solve. Help us answer, "what's all there?".

Required Skills

  • SPARQL
  • svn
  • php
  • Javascript