Professor Peter Fox and Professor Deborah McGuinness and collaborators over the last 7-8 years (May 29, 2013) with replicable success in more than a dozen application areas and projects of various sizes and diversity ( Benedict et al. 2007, Fox et al, 2012). In addition to the vetting and maturing of the methodology in project settings, the methodology has been taught by Tetherless World Constellation faculty (developed by Professor Peter Fox and Professor Deborah L. McGuinness) in the Semantic eScience course. In this course, each of the elements of the method, see Fig. F, are taught in detail. Students are evaluated on a variety of learning objective in addition to a final group project that executes (at least one iteration of) the methodology.
Use Case: Use cases are used to identify questions to be asked, resources to be used to answer the questions, and methods to be used to determine the answer. We utilize a facilitation and scoping approach to generate “good” use cases to allow the domain communities to describe operational and understandable use cases that support data discovery, access, and use. Some notions of good use cases include data discovery and generation to find or generate data that is of acceptable quality and is relevant to the question and methods being used. The use case is captured in a template.
Small (Leadership) Team, Mixed Skills: A variety of skills are critical to successfully addressing data discovery problems. An experienced Facilitator (e.g., Peter Fox. Deborah McGuinness) is needed to set and monitor direction, provide guidance for scoping the use case and what can be implemented and when. Team formation is important and often success depends on good team design and recruitment. Roles include: domain experts(i.e. domain researchers), including data and information product knowledge, knowledge representation and information modeling, software engineering, and someone to be a scribe.
Analysis: This (ongoing and iterative) step includes review (and scoping when applicable) of the use case narrative and normal flow, the actors, the data and related resources, and generation of use case and activity diagrams. In this analysis the goal is to cover a wide range of key attributes for example: analysis of data quality, provenance, completeness, fitness for use. These attributes are captured in the use case document.
Develop Model/ Ontology: We utilize the ANSI approach to information modeling using three layers (conceptual, logical, physical) starting with the conceptual information model, i.e. as a precursor to any ontology formalization. In our early application of this method much of the model development was directed at ontologies but we have found that integration, iteration/evolution of models is more effective at the conceptual IM level (versus trying to evolve an ontology encoded in the Ontology Web Language (OWL), for example). Ultimately, though the ontology is a computational encoding of the meaning of terms that resulting systems will use as well as a crucial interlingua for non-specialists. Currently ontology generation is as much an art than a science, thus driving the team composition. A variety of tools exist to (1) allow semantics/meaning capture by a broad range of users, (2) support identification of starting points for relevant terminology, (3) evaluate ontologies for a given application/ purpose, (4) support browsing and dissemination to gather desirable community discovery, comment, and evolution suggestions.
Use tools: The approach advocates finding and using relevant tools including, for example, ontology editor/browsers, evolution environments, validators/checkers. Tools support data and schema discovery, as well as a new generation of provenance- and context-aware evolution and evaluation tools.
Review and Iteration: As the models are developed there are reviews of the current developments, i.e. results of the analysis, models and their representations, in a broader, i.e. community setting allows an initial evaluation of what technical developments can and should initially follow. The review checks for syntactic and some semantic consistency – from choosing those with the right knowledge and interest to suggesting how to review, capture, track, and act on modification suggestions. In the initial phase of our expedition, review dimensions and reviewer characteristics will be identified and used to suggest review teams and processes. To further scale and automate greater capabilities, we will need automated tools that find and form the right reviewer communities, check and log provenance concerning data and embedded assumptions, review for coverage and identify gaps, identify inconsistencies and make suggestions for modifications, etc.
Adopt technology & technical infrastructure: While using this approach, we rely on participants that are familiar with any existing technical infrastructure as well as with the technical expertise of available personnel that may contribute to an initial prototype. We leverage as much technology as possible to evaluate results from each iteration around the full methodology. An example candidate semantic technology is S2S.
Rapid prototype: Our existing applications have relied on computer scientists to make any adaptations required to tooling and also to glue the components together and connect them to interfaces and visualization tools. Further, often this prototyping needs to be done at scale. Contrarily, beyond the initial prototyping latter stages of the prototype must pay increasing attention to non-functional aspect of the use case, such as scalability, reliability, etc.
Evolve, iterate, and evaluate: Based on input from many of the stages of the Methodology and the developed prototoypes. In general we utilized a combination of formative and summative evaluation using structured and less-structured means (details omitted here). The evaluation is also assessed for prioritizing of the following iterations. These iterations may be to the use case, to the models/ ontologies, etc. We also promote the open-world aspects of semantic web approaches, i.e. not all knowledge is known or encoded, and may be wrong, incomplete and/ or evolving. For this phase of the project, we will utilize a less-structured formative evaluation approach that involves team members and selected stakeholders.
A key attribute of use case driven development and use of semantic approaches to project goals is that it defines the appropriate metadata about each entity; and how they will need to be populated in the application environment, as well as what the critical gaps are (in metadata).
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.