Repurposing Drugs with Semantics (ReDrugs) Project Documentation Page

Printer-friendly version

Our work is also included in the video Rensselaer: Embodying The New Polytechnic

Table of Contents

  1. ReDrugs System Overview
  2. ReDrugs UI Installation Guide
    1. Ubuntu or Mac
    2. Windows
    3. Serve UI online
  3. Bigdata© Nanosparqlserver Installation and Configuration Guide
    1. Using Jetty
    2. Using Tomcat
    3. Deploy on a Server/Virtual Server with Apache2
    4. Start Bigdata© Nanosparqlserver using jetty in command line
  4. Export data from Bigdata© Nanosparqlserver
  5. MISC and historical Documentations

  1. ReDrugs System Overview

    updated on Feb 11, 2015 by Rui Yan

    Repurposing Drugs with Semantics, or ReDrugs, is a project that aims to find treatments to diseases with existing drugs using a semantic method. This documentation will not cover much of the motivation of this project, instead, we will focus on the technical part, including installation and configuration of system's UI and triplestore.

    The system consists of two major parts: UI and triplestore, as illustrated in the following diagram.

    The UI contains front end which allows interaction between users and system, while back end provides 4 services in order to retrieve the data from triple store and display graphs. The triple store contains data coming from a diverse of data sources including iRefindex, OMIM and drugbank (more info on bio2rdf site. The data are stored in a format called nanopublication.

    There are currently two virtual machines hosting two instances of the ReDrugs system, one is, the other is The former is our product server where everything is stable and working, the latter is a developing server where most of the new features will be tested out before they go to production

  2. ReDrugs UI Installation Guide

    updated on Mar 10, 2015 by Rui Yan

    You can install ReDrugs UI directly if you are on Ubuntu or Mac, a virtual machine is needed if you want to install it on Windows.

    1. Ubuntu or Mac

      This installation is tested several times on Ubuntu 14.04 LTS, also tested on one Mac. If you install it on Ubunt 10.**, problems will occur and solutions are found right now. So Ubuntu 14.04 LTS is recommended.

      1. open up your terminal and type:
        git clone
      2. ignore this if you installed an easy_install. You can check it by typing:
        which easy_install
        And if nothing happens, you need to install it. If you have already installed it, its directory will show up. If you are on Ubuntu, you can install it using:

        wget -O - | sudo python

        See official reference for more details.

      3. ignore this step if you installed a virtualenv:
        sudo easy_install virtualenv
      4. navigate to redrugs folder:
        cd redrugs
      5. create a virtual environment inside:
        virtualenv venv
      6. install TurboGears (one of the ReDrugs Sys dependencies):
        venv/bin/pip install tg.devtools
      7. activate your virtualenv:
        source venv/bin/activate
      8. install other dependencies:
        sudo apt-get install python-pip python-dev build-essential

        pip install numpy

        sudo apt-get install gfortran libopenblas-dev liblapack-dev
      9. run python setup tool and install:
        python develop
        pip install -e .
      10. change your listening port by opening development.ini file if necessary, the default port is 8085
      11. to start the redrugs sys:
        gearbox serve
        and you can goto http://localhost:8085/redrugs.html
      12. While developing you may want the server to reload after changes in package files (or its dependencies) are saved. This can be achieved easily by adding the --reload option:
        gearbox serve --reload
      13. You can use "ctrl+c" to terminate running sys
      14. You can type "deactivate" to exit the virtualenv
      15. You need to do step 4, 7 and 11 in order to start the sys again if you fully quit.

    2. Windows

      1. install a virtual machine such as vmware
      2. install Ubuntu 14.04 LTS in your vmware
      3. follow Ubuntu or Mac section steps

    3. Service UI online

      1. install apache2: here

      2. install apache2 mod_wsig: here

      3. install redrugsUI according to the above tutorial

      4. open your /etc/apache2/sites-enabled/000-default.conf file and replace it with this file:

        	WSGIPythonPath /home/redrugsUI/redrugs/venv/lib/python2.7 #your virtualenv python lib path
                # The ServerName directive sets the request scheme, hostname and port that
                # the server uses to identify itself. This is used when creating
                # redirection URLs. In the context of virtual hosts, the ServerName
                # specifies what hostname must appear in the request's Host: header to
                # match this virtual host. For the default virtual host (this file) this
                # value is not decisive as it is used as a last resort host regardless.
                # However, you must set it for any further virtual host explicitly.
                ServerAdmin webmaster@localhost
                DocumentRoot /var/www/html
                # Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
                # error, crit, alert, emerg.
                # It is also possible to configure the loglevel for particular
                # modules, e.g.
                #LogLevel info ssl:warn
                ErrorLog ${APACHE_LOG_DIR}/error.log
                CustomLog ${APACHE_LOG_DIR}/access.log combined
                # For most configuration files from conf-available/, which are
                # enabled or disabled at a global level, it is possible to
                # include a line for only one particular virtual host. For example the
                # following line enables the CGI configuration for this host only
                # after it has been globally disabled with "a2disconf".
                #Include conf-available/serve-cgi-bin.conf
        ### Added by Rui Yan
        # to access bigdata at    
            RewriteEngine On
            RewriteRule ^/bigdata$ /bigdata/ [R,L]
            ProxyTimeout 1800
            ProxyRequests Off
                Order allow,deny
                Allow from All
                ProxyPass http://localhost:8080/bigdata
                ProxyPassReverse /
                SetOutputFilter proxy-html
        # end of bigdata configuration
        # to access redrugsUI at
            WSGIDaemonProcess user=www-data group=www-data threads=4 python-path=/home/redrugsUI/redrugs/venv/lib/python2.7/site-packages
            WSGIScriptAlias / /home/redrugsUI/redrugs/redrugs.wsgi
            # serve static files directly without TurboGears
            Alias /images /home/redrugsUI/redrugs/redrugs/public/images
            Alias /css /home/redrugsUI/redrugs/redrugs/public/css
            Alias /js /home/redrugsUI/redrugs/redrugs/public/javascript 
            CustomLog /etc/apache2/logs/redrugs-access.log common
            Errorlog /etc/apache2/logs/redrugs-error.log
        #give apache2 access to the redrugs directory    
                Options All
                AllowOverride All
                Require all granted

      5. You need to include redrugs.wsgi and production.ini file under /home/redrugsUI/redrugs/ directory

      6. redrugs.wsgi file:

        APP_CONFIG = "/home/redrugsUI/redrugs/production.ini"
        #Setup logging
        import logging.config
        #Load the application
        from paste.deploy import loadapp
        application = loadapp('config:%s' % APP_CONFIG)

      7. production.ini file:

        # redrugs - Pylons development environment configuration
        # The %(here)s variable will be replaced with the parent directory of this file
        # This file is for deployment specific config options -- other configuration
        # that is always required for the app is done in the config directory,
        # and generally should not be modified by end users.
        debug = true
        # Uncomment and replace with the address which should receive any error reports
        #email_to =
        smtp_server = localhost
        error_email_from = paste@localhost
        use = egg:Paste#http
        host =
        port = 8085
        cookie_secret = c976a731-b2bf-4d84-a2fa-279c21405732
        use = egg:redrugs
        full_stack = true
        #lang = ru
        cache_dir = %(here)s/data
        beaker.session.key = redrugs
        beaker.session.secret = c976a731-b2bf-4d84-a2fa-279c21405732
        #By default session is store in cookies to avoid the overhead
        #of having to manage a session storage. On production you might
        #want to switch to a better session storage.
        beaker.session.type = cookie
        beaker.session.validate_key = c976a731-b2bf-4d84-a2fa-279c21405732
        # Disable template autoreload to boost performances in production
        # WARNING: if you want to deploy your application using a zipped egg
        # (ie: if your application's defines zip-safe=True, then you
        # MUST put "false" for the production environment because there will
        # be no disk and real files to compare time with.
        #auto_reload_templates = false
        # If you'd like to fine-tune the individual locations of the cache data dirs
        # for the Cache data, or the Session saves, un-comment the desired settings
        # here:
        #beaker.cache.data_dir = %(here)s/data/cache
        #beaker.session.data_dir = %(here)s/data/sessions
        # pick the form for your database
        # %(here) may include a ':' character on Windows environments; this can
        # invalidate the URI when specifying a SQLite db via path name
        # sqlalchemy.url=postgres://username:password@hostname:port/databasename
        # sqlalchemy.url=mysql://username:password@hostname:port/databasename
        # If you have sqlite, here's a simple default to get you started
        # in development
        sqlalchemy.url = sqlite:///%(here)s/devdata.db
        #echo shouldn't be used together with the logging module.
        sqlalchemy.echo = false
        sqlalchemy.echo_pool = false
        sqlalchemy.pool_recycle = 3600
        # This line ensures that Genshi will render xhtml when sending the
        # output. Change to html or xml, as desired.
        templating.genshi.method = xhtml
        templating.genshi.doctype = html5
        # the compiled template dir is a directory that must be readable and writable
        # by your webserver. It will be used to store the resulting templates once
        # compiled by the TemplateLookup system.
        # During development you generally don't need this option since paste's HTTP
        # server will have access to you development directories, but in production
        # you'll most certainly want to have apache or nginx to write in a directory
        # that does not contain any source code in any form for obvious security
        # reasons.  If disabled, None, False, or not writable, it will fall back
        # to an in-memory cache.
        templating.mako.compiled_templates_dir = %(here)s/data/templates
        # Debug mode will enable the interactive debugging tool, allowing ANYONE to
        # execute malicious code after an exception is raised.
        #set debug = false
        # Logging configuration
        # Add additional loggers, handlers, formatters here
        # Uses python's logging config file format
        #turn this setting to "min" if you would like tw to produce minified
        #javascript files (if your library supports that)
        keys = root, redrugs, sqlalchemy
        keys = console
        keys = generic
        # If you create additional loggers, add them as a key to [loggers]
        level = INFO
        handlers = console
        level = DEBUG
        handlers =
        qualname = redrugs
        level = INFO
        handlers =
        qualname = sqlalchemy.engine
        # "level = INFO" logs SQL queries.
        # "level = DEBUG" logs SQL queries and results.
        # "level = WARN" logs neither.  (Recommended for production systems.)
        # If you create additional handlers, add them as a key to [handlers]
        class = StreamHandler
        args = (sys.stderr,)
        level = NOTSET
        formatter = generic
        # If you create additional formatters, add them as a key to [formatters]
        format = %(asctime)s,%(msecs)03d %(levelname)-5.5s [%(name)s] %(message)s
        datefmt = %H:%M:%S	

      8. you also need to change the user and group for virtualenv folder:

        	sudo chown -R www-data:www-data /home/redrugsUI/redrugs/venv/

      9. go to your browser and type:, you will see it's online now.

      10. If you are using aquarius, just go to your allocated IP address and it's there

  3. Bigdata© Nanosparqlserver Installation and Configuration Guide

    updated on Feb 23, 2015 by Rui Yan

    Bigdata© Nanosparqlserver is very easy to deploy and use. Although one thing I don't like is that anyone who can access to the endpoint UI can be able to change the data (if they want to). Maybe there is an account thing for Bigdata© Nanosparqlserver, but currently I didn't find any information.

    There are many ways to deploy Bigdata© Nanosparqlserver according to its official documentation, in this documentation, you will see very detailed foolproof guide on deploying Bigdata© Nanosparqlserver using both jetty and tomcat.

    1. Using Jetty

      1. Make sure your have installed and configured JAVA
      2. Download bigdata.war file from here
      3. Download jetty from here. Any version is fine, my version is Stable 9.2.7.v20150116. You place it in any desired directory. I place it on Desktop.
      4. Unzip your bigdata.war file (you can find an unzip software that can unzip .war file on google) into a folder called bigdata.
      5. Unzip your jetty zip file into jetty folder
      6. Place bigdata folder into jetty/webapps directory
      7. Inside the webapps/bigdata/WEB-INF, find a file called web.xml file. This file contains your nanosparqlserver triplestore property file location information.
      8. You open this web.xml file, change <param-value>../webapps/bigdata/WEB-INF/</param-value> to the full path, in my case, it's <param-value>/home/rui/Desktop/jetty/webapps/bigdata/WEB-INF/</param-value> or <param-value>./webapps/bigdata/WEB-INF/</param-value>, whichever you like.
      9. You are also welcome to customize your file which contains triplestore properties information such as the the data file location, etc. But It's OK if you don't touch it for the moment.
      10. Open you terminal, and cd to jetty directory: cd /home/rui/Desktop/jetty
      11. Start Jetty by typing: java -jar start.jar
      12. Open up your browser and go to http://localhost:8080/bigdata/#query, you should see bigdata UI and you are ready to go!
      13. By default, the bigdata.jnl file will be generated at jetty/bin/ directory, this file contains all your data, so make sure you back up it. It's initial size is 209.8Mb. One good thing is that when you do data migration, you don't necessarily export the data and import them into another nanosparqlserver instance, you can copy the source bigdata.jnl to destination bigdata.jnl and switch namespace by clicking namespace tag in UI, and click "use" button after each namespace listed. You might need to restart jetty.

    2. Using Tomcat

      1. Make sure you installed and configured JAVA
      2. Install and configure tomcat, see here for details.
      3. You might need to install lower version tomcat (like tomcat 6 or 7) since higher version is probably not compatible with nanosparqlserver, anyway, you are welcome to test higher version tomcat out. I use Tomcat 6 and have it installed at /usr/local/tomcat6/
      4. Download bigdata.war file from here
      5. Unzip your bigdata.war file (you can find an unzip software that can unzip .war file on google) into a folder called bigdata.
      6. Copy bigdata folder in tomcat/webapps/ directory, in my case, it's /usr/local/tomcat6/webapps/
      7. You can customize web.xml or files under webapps/bigdata/WEB-INF directory: You open this web.xml file, change <param-value>../webapps/bigdata/WEB-INF/</param-value> to the full path, in my case, it's <param-value>/home/rui/Desktop/jetty/webapps/bigdata/WEB-INF/</param-value> or <param-value>./webapps/bigdata/WEB-INF/</param-value>, whichever you like. file contains triplestore properties information such as the the data file location, etc. But It's OK if you don't touch it for the moment.
      8. in your teminal, cd to tomcat bin folder, in my case: cd /usr/local/tomcat6/bin
      9. continue to type: sh ./
      10. Your tomcat will start at http://localhost:8080/
      11. And you can access to nanosparqlserver UI at http://localhost:8080/bigdata/
      12. Be default, bigdata.jnl file will be generated at tomcat/bin/ directory. This file contains all your data, so make sure you back up it. It's initial size is 209.8Mb. One good thing is that when you do data migration, you don't necessarily export the data and import them into another nanosparqlserver instance, you can copy the source bigdata.jnl to destination bigdata.jnl and switch namespace by clicking namespace tag in UI, and click "use" button after each namespace listed. You might need to restart jetty.

    3. Deploy on a Server/Virtual Server with Apache2

      Not satified with only accessing Bigdata© Nanosparqlserver at your localhost? This section is specificly written for you!

      1. First, you need to have an Apache2-enabled virtual server. Go to ask Patrick if you need one
      2. Second, you also need sudo privilages on your virtual machine.
      3. Our virtual server is located at
      4. We would like to have access to Bigdata© Nanosparqlserver UI at
      5. We are going to use Tomcat to host bigdata.war, please refer to here for tomcat and bigdata installation guide.
      6. After you have tomcat and bigdata ready, and now tomcat is running at localhost:8080 in your virtual machine, you are ready to do the following work
      7. Enable apahce2 mod_rewrite, mod_proxy, mod_proxy_http, mod_proxy_html:
        sudo a2enmod rewrite proxy proxy_http proxy_html
      8. Go to /etc/apache2/sites-enabled/ directory and open 000-default file. Then add the following between </Directory> and </VirtualHost>:
            RewriteEngine On
            RewriteRule ^/bigdata$ /bigdata/ [R,L]
            ProxyTimeout 1800
            ProxyRequests Off
            <Location /bigdata>
                Order allow,deny
                Allow from All
                ProxyPass http://localhost:8080/bigdata
                ProxyPassReverse /
                SetOutputFilter proxy-html
      9. You need to restart your apache2 service: sudo service apache2 restart
      10. You might also need to autostart tomcat after Patrick reboot virtual server, all you need is to write a tomcat init.d script and throw it at /etc/init.d/ directory. A detailed guide on how to write this script can be found here
      11. Now, you can access your Bigdata© Nanosparqlserver publicly at
    4. Start Bigdata© Nanosparqlserver using jetty in command line

      So another method to start Bigdata© Nanosparqlserver is via jetty command line, you will see the difference and convenince. We also include a logging file to log the output so you can track what's going on during the running history of the server

      1. Customize your XXX/webapps/bigdata/WEB-INF/web.xml and XXX/webapps/bigata/WEB-INF/ files. Normally, web.xml file contains file directory, and contains where the data file is. So you are going to open web.xml file, find <param-value>../webapps/bigdata/WEB-INF/</param-value>, it's recommended to change this file into a full directory, in my case, change it into <param-value>/home/rui/Desktop/jetty-distribution-7.6.16.v20140903/webapps/bigdata/WEB-INF/</param-value>. Save and close the file. You might also change the namespace, find <param-value>kb</param-value> and change kb into whatever you like, I change it with test. So that you can see this namespace at http://localhost:8080/bigdata/#namespace once your bigdata server is on.
      2. Open file, change com.bigdata.journal.AbstractJournal.file=bigdata.jnl into com.bigdata.journal.AbstractJournal.file= . Also create an empty folder called data at XXX/webapps/bigdata/data, and change bigdata.jnl into a full directory, in my case, it's /home/rui/Desktop/jetty-distribution-7.6.16.v20140903/webapps/bigdata/data/testdata.jnl . Yes, you can call your data file whatever you like, the default name is bigdata.jnl, but here you see, I name it testdata.jnl.
      3. You can create an empty file, name it, and copy the following code:
        CLASSPATH=`find $LIBDIR -name \*.jar -exec echo -n {}: ';'`.
        set -- "${@}"
        java -cp $CLASSPATH -server -Dlog4j.configuration=file:/home/jetty-distribution-9.2.7.v20150116/webapps/bigdata/WEB-INF/classes/ com.bigdata.rdf.sail.webapp.NanoSparqlServer $@	
      4. You need to customize the following: first, for LIBDRIR= ... , you need to put your jetty directory here. Mine is /home/jetty-distribution-9.2.7.v20150116/ , so I put it after the equal sign. Second, you need to replace the directory after -Dlog4j.configuration= ...
      5. save your file, and in terminal type:
        sudo chmod -x ./
        to make it executable.
      6. Your command show follow the pattern: <port> <namespace> < file> < /dev/null ><your log file> Your log file will record all the bigdata output (including errors)
      7. In your terminal type:
        sudo sh ./ 8080 test /home/jetty-distribution-9.2.7.v20150116/webapps/bigdata/WEB-INF/ < /dev/null >test.log
        In this code, you need to change file directory if needed.
      8. If you want to leave it running constantly after you log out your session, you can do
        nohup sudo sh ./ 8080 test /home/jetty-distribution-9.2.7.v20150116/webapps/bigdata/WEB-INF/ < /dev/null &>test.log &

  4. Export data from Bigdata© Nanosparqlserver

    updated on Feb 18, 2015 by Rui Yan

    You have to shut down Bigdata© Nanosparqlserver before your export the data. It's a rule. See here

    1. shut down your Bigdata© Nanosparqlserver instance.
    2. Depends on how you host bigdata, if jetty, then go to jetty/webapps/bigdata/WEB-INF/; if tomcat, goto tomcat/webapps/bigdata/WEB-INF/
    3. Open your, com.bigdata.journal.AbstractJournal.file= your bigdata.jnl full path. In my case: com.bigdata.journal.AbstractJournal.file=/home/rui/Desktop/jetty-distribution-7.6.16.v20140903/bigdata.jnl
    4. Also
    5. Then save the file
    6. Open a new empty file, and paste the following arguments:
      java -cp $CP/bigdata-1.3.1.jar:$CP/openrdf-sesame-2.6.10-onejar.jar:$CP/log4j-1.2.17.jar:$CP/high-scale-lib-v1.1.2.jar:$CP/fastutil-stripped.jar:$CP/icu4j-4.8.jar:$CP/slf4j-api-1.6.1.jar -Dlog4j.configuration=file:$CF/log4j_export.xml com.bigdata.rdf.sail.ExportKB $CPP/ $*

      Where CP is the lib directory, if you use jetty, then jettp/webapps/bigdata/WEB-INF/lib, same for tomcat. But need full path here. You create a log4j_export.xml file (it's an empy file) at your desktop, and customize CF as your desktop directory. CPP is where your locates.

    7. You save this file as on desktop.
    8. In your terminal, cd to your desktop
    9. In your terminal, you need to change the mod of this .sh file: chmod +x ./
    10. Then you can run this script: sh ./
    11. This script will generate a folder with the same name of namespace in your bigdata triplestore, the data is zipped in a file similarly called "data.xml.gz". You need to waite this exporting process ends before you touch this data file. You are welcome to unzip the data. The data format is in RDF/XML

  5. MISC and historical Documentations

    updated on Feb 11, 2015 by Rui Yan

    The following contents are deprecated and only for archive


    How to run ReDrugs systems on your local virtual machine:

    1. Running Environment:

    Rui is running ReDrugs system on Ubuntu-12.04.4-desktop-i386 in VMware, you can use any other system, but please keep in mind that the command lines used in this how-to is only Ubuntu-like OS specific

    Clean install the Ubuntu in your virtual machine, nothing dependencies needed before you set up the running environment by following this how-to

    2. System configuration:

    • a) go to this link and hit the download zip at the right bottom corner, save this zip file into any your local directory (Rui saved this file in ~/Downloads/)
    • b) unzip "" file into any directory you like (Rui unzipped this file under ~/Desktop/ )
    • c) download this rdflib file, unzip it under any directory you like (Rui unzipped this file under ~/Desktop/ )
    • d) (ignore this if you have already installed virtualenv) install virtualenv : a good reference is here and here. You can open up your terminal and type
      sudo easy_install virtualenv


    • e) create a virtualenv folder called venv: assume you are under ~/Desktop/ directory, you can open up the terminal and type
      virtualenv --no-site-packages venv

      so that venv folder is created under ~/Desktop/venv

    • f) activate the virtualenv by typing in your terminal the following commands
      cd venv/
      source bin/activate
    • g) install rdflib: type the following command in your terminal
      cd rdflib
      python install
    • h) under melagrid/redrugs directory, run
      python develop

      , it's should be successful. If not, please contact Rui Yan

    • i) install the following dependencies
      pip install pylons
      pip install webhelpers
      pip install gearbox
      pip install Werkzeug
    • j) in your terminal type:
      sudo apt-get install libtidy-dev
    • k) change your listening port by opening development.ini file under melagrid/redrugs if necessary (the default port is 8080 in development.ini, but if this port is occupied by others (like tomcat, etc), you can change the port manually to others, Rui changed it into 8085
    • l) in your terminal run:
      gearbox serve --debug

      You will see in your terminal that it's running

    • m) open your browser, type "localhost:8085/redrugs" , you will see that it's running (8085 is the port, you can change it according to step k)

    Then, you are good to go!

    trouble shooting:

    If you typed

    gearbox serve --debug

    , and you see problems like

    pkg_resources.DistributionNotFound: XXX

    , you need to install the xxx package. Rui has went through this instruction on 7/14/2014, and found that one need to install the following packages between the above Step j and Step k. They are:

    pip install pytidylib
    pip install python-dateutil
    pip install tw2.core
    pip install transaction
    pip install crank


    How to manually find diseases associated with certain gene in Gene Expression Atlas

    OK, let's start from CA2, the one connected with Topiramate.

    • 1. Go to this link and type CA2
    • 2. Then you will have a list of result like this
    • 3. Make sure you find a human gene (you can check that by looking at gene description)
    • 4. Copy the gene ID, in this case, it's 760
    • 5. OK, then copy this url: and paste 760 at the end of it
    • 6. Paste into your browser, and hit enter
    • 7. You are led into an Atlas Linked Data page. Scoll down and click "more" under "Related Form" section
    • 8. Click each link begins with "A-AFFY-", which means Affymetrix probe set IDs, for example, let's click A-AFFY-1/40095_at
    • 9. You will be led into another page. Scroll down to "Related Form" section, you can see something like "CA2 up in ..." or "CA2 down in ...", find diseases' names after "in"

    Diseases found:

    Gene Human Gene ID Disease
    CA2 760
    CA4 762
    SCN1A 6323
    GRK1 6011
    GABRA1 2554


    How to execute SPARQL queries against the Gene Expression Atlas endpoint

    1. What information should we get from the endpoint?

    Before I address this question, you should realize that we are going to use nanopublication format, which includes an assertion, an attribute and a provenance.
    That means that three information is needed from the endpoint. We would like to get a gene-disease association as the assertion, the probability of the gene-disease association
    as the attribute and the experiment method as the provenance.

    2. Explore the information we need manually

    1. We can firstly manually explore the data in the Gene Expression Atlas linked data web application. Let's take "CA2 DOWN in Burkitts lymphoma" for example. You can go to the above gene-disease table, and click
    "DOWN in Burkitts lymphoma" link, which will lead you to this.
    Scroll down to "Related to", has factor value, isMeasurementOf, pValue are three important things. In this case, "CA2 DOWN in Burkitts lymphoma" has factor value (Factor value) disease/Burkitts lymphoma.
    You can see from the box on the top that Type: Burkitts lymphoma. You click on Type: Burkitts lymphoma, you will see that it's a subClassOf lymphoma, you click lymphoma, you'll see it's a subClassOf lymphoid neoplasm, you click on lymphoid neoplasm, you'll see it's subClassOf cancer, you keep clicking
    on cancer, you'll see it's subClassOf neoplasm, you keep click on neoplasm, you'll see it's subClassOf disease, which means that Burkitts lymphoma is a disease.

    2. The above step shows you how to find a disease. Now let's find competing hypothesis. What is competing hypothesis? Let's say that gene1 - disease1, gene2 - disease1, gene3 - disease1 ... gene-n - disease1. There are n genes associated with one same disease, these n gene-diseases are competing hypothesis. Also we don't consider the situation which multiple genes accociated with one disease because in the biomedical experiment, researchers only concentrate on one gene.
    So we can click on "differential analysis on E-GEOD-1880 / A-AFFY-1" which is the object of "hasExpressionValue". You can click on that, which leads you to another page that has a predicate hasExpressionValue, it lists all the competing hypothesis. We can will use "differential analysis on E-GEOD-1880 / A-AFFY-1" as an experiment provenance. Another provenance is, for example in this case, CA2 DOWN in Burkitts lymphoma

    3. The third thing is to calculate the probability, and the math is as follows:

    Let's denote x = 1/n (n is the total number of competing hypothesis), y = 1-pValue, z = pValue. So probability is xy/(xy+z(1-x)), which can be included into the nanopublication as an attribute.

    3. Let's explore the endpoint

    We currently don't know anything about it, I don't know what the namespace, the predicates and the proper name in the database. But what we do have is the information that we manually explored as the above. So that's a good hint for us to explore a black-box database. This part is not limited to the Gene Expression Atlas endpoint, it's more than a guide and experience for you to explore other black-box sparql endpoint.

    So the sparql query is as follows:

    PREFIX rdf: <>
    PREFIX rdfs: <>
    PREFIX owl: <>
    PREFIX dcterms: <>
    PREFIX obo: <>
    PREFIX sio: <>
    PREFIX efo: <>
    PREFIX atlas: <>
    PREFIX atlasterms: <>
    PREFIX xsd: <>
    PREFIX identifiers:<>
    SELECT distinct ?value ?diffValue ?factor ?factorValue ?probe ?pvalue  
    WHERE {             
    ?expUri atlasterms:hasAnalysis ?analysis .       
    ?analysis atlasterms:hasExpressionValue ?value .   
    ?value rdfs:label ?diffValue .       
    ?value atlasterms:hasFactorValue ?factor .  
    ?factor atlasterms:propertyType "disease"^^xsd:string .
    ?factor rdfs:label ?factorValue .
    ?value atlasterms:pValue ?pvalue .      
    ?value atlasterms:isMeasurementOf ?probe .    
    ?probe atlasterms:dbXref ?iden .
    ?iden rdfs:label "CA2"^^xsd:string .

    You can change "CA2" into other genes to find other genes' related information. ?Value is hasExpressionValue's object's URL, ?diffValue is something like "CA2 Down in ...", ?factor is the factor's url, ?factorValue is the label of ?factor, ?pvalue is pValue, ?probe is experiment method, like "A-AFFY-..."
    One can see the result here

    How to setup/deploy a nanosparqlserver instance in your local machine

    The ReDrugs system is based on the bigdata graph database which is standards-based, high-performance, scalable and open-source. NanoSparqlServer provides a light weight REST API for RDF. This sections shows how to deploy an instance of NanoSparqlServer in your local machine.
    I am on a Windows7 machine. The deployment on other OS might be different from this. I will mark the possible differences in the following:

    • There are 3 ways to deploy NanoSparqlServer, command line using jetty, embedded using jetty and servlet container like Tomcat, the easiest way is through servlet
    • We will be using Tomcat
    • Which version of Tomcat? Choose Tomcat6, don't try Tomcat7 or 8, there are some compatible problems between NanoSparqlServer and Tomcat 7 or 8 by the time this instruction is written(06/25/2014).
    • Installation of Tomcat - might be different across the OS
    • Rui installed his tomcat in this directory: C:\Program Files\Apache Software Foundation\Tomcat 6.0
    • Download the bigdata.war file from here: link
    • Put this .war file under: C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps\
    • Start your tomcat by double-clicking Tomcat6.exe under C:\Program Files\Apache Software Foundation\Tomcat 6.0\bin
    • Tomcat 6 will automatically deploy this bigdata.war file by extracting it into a folder called bigdata
    • And you can type: localhost:port/bigdata in your browser and it will run
    • If you didn't change the port during your tomcat6 installation, it's 8080 by default
    • You can also extract this .war file into bigdata folder, and then copy this folder into webapps directory before your start tomcat6

    if you are on Ubuntu, here the right way to deploy bigdata instance:

    • install tomcat 7
    • dont' use apt-get to install tomcat7, it will cause problems on deploying bigdata
    • go directly tomcat7 official link and download the .tar.gz file: here is the
    • click this link to download bigdata.war file
    • put your bigdata.war file into your tomcat 7 webapps directory: in Rui's instance, it's /home/rui/Desktop/apache-tomcat-7.0.55/webapps
    • go to your terminal, cd Desktop/apache-tomcat-7.0.55/bin, and type "sudo ./" to start the tomcat up
    • go to localhost:8080/bigdata, should be OK now. (use 8080 as tomcat7 default, you can also change this port by changing conf/server.xml file (google to see how to change....)

    How to write a init.d file for redrugs system to make it automatically restarted when aquarius restarts ?

    open up your text editor, create a file called, and type the following command:

    cd /home/jimmccusker/bigdata
    nohup ./ 8880 redrugs < /dev/null &>redrugs.log &
    service apache restart
    service apache2 restart
    cd ../prizms/melagrid/redrugs/
    source ../venv/bin/activate
    nohup gearbox serve &> redrugs.log < /dev/null &

    save it on your desktop, in your terminal, type:

    sudo chown -R root /home/YOURUSERNAME/Desktp/
    sudo chgrp -R root /home/YOURUSERNAME/Desktop/

    to change the owner and group of the file.

    create another new file, name it redrugsAutoStart and copy & paste the following:

    # Provides:          Redrugs
    # Required-Start:    $local_fs $network $syslog
    # Required-Stop:     $local_fs $network $syslog
    # Default-Start:     2 3 4 5
    # Default-Stop:      0 1 6
    # Short-Description: Starts the servlet of Dataone-Linkipedia
    # Dataone-Linkipedia             This init.d script is used to start Dataone-Linkipedia
    . /lib/lsb/init-functions
    export APPROOT
    pidofme() {
        if [ -f /var/run/$ ]
    	cat /var/run/$
    	PID=`ps aux | grep $PROGRAM | awk '{print $2}'`
    	echo $PID
        return 0
    start() {
        touch /var/run/$
        # if your app runs under a dedicated user, uncomment and substitute it here
        #chown user:user /var/run/$
        echo -n "Starting $PROGRAM: "
        # if your app runs under a dedicated user, sudo to that user here
        #sudo -u user nohup $APPROOT/$PROGRAM # include arguments here
        # else
        cd $APPROOT; nohup $APPROOT/$PROGRAM & # include arguments here
        # endif
        sleep 1
        if [ "${PID}" ]; then
        return $RETVAL
    stop() {
        echo -n "Stopping $PROGRAM: "
        kill "${PID}"
        sleep 5
        if [ "$PID" ]; then
            # attempt SIGKILL
            kill -9 "${PID}"
            sleep 5
        if [ "$PID" ]; then
        return $RETVAL
    case $1 in
    	if stop; then
    	    sleep 2
            echo "Usage: /etc/init.d/`basename $0` {start|stop|restart}"
    	exit 1
    exit $RETVAL
    </pre >

    In which, the

    is the directory where you put your, and

    should be the name of

    And do chown and chgrp for this file also.
    Put redrugsAutoStart in /etc/init.d folder, put in /home/jimmccusker/prizms/melagrid/.

    should be all set... But since the server hasn't been restarted yet, I don't know if this inid.d script will work....

    Current Problems
    1. How to restart public site?

    sudo su - jimmccusker
    cd ~jimmccusker/bigdata
    nohup ./ 8880 redrugs < /dev/null &>redrugs.log &
    cd ../prizms/melagrid/redrugs/
    source ../venv/bin/activate
    nohup gearbox serve &> redrugs.log < /dev/null &
    Bigdata's webapp should be transitioned over to an OS-installed jetty or tomcat instance.
    The turbogears app can be deployed directly to apache using these instructions:

    2. Which directory I should go to replace redrugs.js/redrugs.html/style.css ?
    go to /home/jimmcusker/prizms/melagrid/redrugs
    3. Where is the SPARQL endpoint of the database? Rui would like to explore the data.
    4. How to upload the new TRiG data into the database in aquarius?
    (on ubuntu): open up the terminal, and type

    curl -X POST -H '' --data-binary '' http://localhost:8080/bigdata/sparql (or

    5. How to connect your local UI with your local database?
    go to your local redrugs instance, Rui has it in Document/melagrid-master/redrugs/redrugs/model, open file, and replace the endpoint with your local nanosparqlserver instance,something like localhost:xxxx/bigdata/sparql etc..