SESF Workgroup - Distributed Data Networks

From Semantic Portal Wiki

Jump to: navigation, search

Semantic eScience
Framework
Project Information

Presentations
Glossary

Project Organization

Meeting Notes
Scheduled Milestones
Working Groups
Project Members

Design

Use Cases
Functional Requirements
Non-Functional Requirements
Design Documents
Architecture Documents
Technology Infrastructure

Contents

Goal

The goal of this working group is to investigate methods of distributing data across many different systems for purposes of redundancy, data access from remote institutions, and data discovery between different research groups interested in similar phenomena.

Use Cases

Data Discovery

Before starting a new research project/grant proposal/etc., a student/researcher wants to obtain information in related areas. For example, she would like to know of health studies done on individuals living in areas of high levels of particulates. Configuring a new data repository to maintain the metadata and any copies of data the individual would like to keep locally. They can then perform searches for terms (read classes, properties) that relate to particulates and health. Constructing a query, and executing it, the query is sent simultaneously to its known peers (who can, in turn, pass it along further). Responses might include health studies in populated areas near congested roadways, near areas of recent volcanic activity, etc. The individual can then choose interesting datasets and access metadata, papers, etc. associated with them, making copies in the local repository.

Data Recovery

A server hosting data has a hard drive failure, losing GB of data. The distributed nature of the data system causes small chunks of the dataset to be sent to many different locations around the world. When a new machine is installed, the application can be configured to retrieve the chunks of the existing dataset, reconstructing the lost data from the data network.

Data Subscription

Researchers in a particular field studying a particular phenomenon at different institutes want to have access to new data sets to perform validations of their theories. They can have their data endpoint subscribe to keywords (terms) or other researchers (papers, etc.). The system propogates their request across the network so that when a new dataset becomes available, they can gain access to it and validate their theories.

Data Sharing

Collaborators across multiple institutions are working together to solve a common problem. Data recoreded at one institute can be pushed out (seeded) to a number of nearby peers, who can then share these smaller chunks with all of the institutes, increasing the effective throughput of the system.

Personal tools
Semantic Web Community
Tetherless World constellation
maintenance