SESF Workgroup - Distributed Data Networks
From Semantic Portal Wiki
| Semantic eScience Framework |
| Project Information |
| Project Organization |
|
Meeting Notes |
| Design |
|
Use Cases |
Contents |
Goal
The goal of this working group is to investigate methods of distributing data across many different systems for purposes of redundancy, data access from remote institutions, and data discovery between different research groups interested in similar phenomena.
Use Cases
Data Discovery
Before starting a new research project/grant proposal/etc., a student/researcher wants to obtain information in related areas. For example, she would like to know of health studies done on individuals living in areas of high levels of particulates. Configuring a new data repository to maintain the metadata and any copies of data the individual would like to keep locally. They can then perform searches for terms (read classes, properties) that relate to particulates and health. Constructing a query, and executing it, the query is sent simultaneously to its known peers (who can, in turn, pass it along further). Responses might include health studies in populated areas near congested roadways, near areas of recent volcanic activity, etc. The individual can then choose interesting datasets and access metadata, papers, etc. associated with them, making copies in the local repository.
Data Recovery
A server hosting data has a hard drive failure, losing GB of data. The distributed nature of the data system causes small chunks of the dataset to be sent to many different locations around the world. When a new machine is installed, the application can be configured to retrieve the chunks of the existing dataset, reconstructing the lost data from the data network.
Data Subscription
Researchers in a particular field studying a particular phenomenon at different institutes want to have access to new data sets to perform validations of their theories. They can have their data endpoint subscribe to keywords (terms) or other researchers (papers, etc.). The system propogates their request across the network so that when a new dataset becomes available, they can gain access to it and validate their theories.
Data Sharing
Collaborators across multiple institutions are working together to solve a common problem. Data recoreded at one institute can be pushed out (seeded) to a number of nearby peers, who can then share these smaller chunks with all of the institutes, increasing the effective throughput of the system.

