DQSS Ontology Primer

Introduction

This is the primer for the Data Quality Screening Service (DQSS) ontology.

We provide a quick overview of the model and its core classes and then dive into an example of how to use the ontology to encode a quality view in RDF.


Intuitive Overview of Data Quality Screening Service Model

The DQSS ontology can be broken down into 3 major components:

  • Data Fields and their bindings to Data Variables
  • Data Field types and their semantic interpretation
  • Quality Views and Assertion Expressions

note: we should probably actually split the DQSS ontology into 3 modules based on this breakdown of functionality. -- Stephan

An Overview of Data Fields and their bindings to Data Variables

...

Fields

Fields describe how a parameter is represented in a variable. Fields have dimensions, just like variables, but fields should not have sampling dimensions because fields only contain one parameter, whereas variables may be used to represent several parameters simultaneously.

  • Data fields are a specialization of field that are used to contain data measurements.
  • Screening fields are a specialization of Field that are used to contain quality parameters, such as retrieval conditions, surface type, confidence levels, etc.

Datasets

...

Variables

Variables describe distinct objects within a data file that contain data information.

  • A data variable is a specialization of Variable that contains data measurements. In HDF, this is usually an SDS (Scientific Data Set) array
  • A ScreeningVariable is a specialization of variable that contains quality information.

Dimensions

An extent over which a parameter is represented that is independent of the parameter.

  • Coordinate dimensions describe a dimension over which a parameter is measured. (e.g. cell_across_swath) Only coordinate dimensions may be bound to a field.
  • Sampling dimensions describe a dimension over which multiple parameters are represented. (e.g. qa_byte_land) Sampling dimensions may not be bound to a field.

ValueTypeEncoding

The ValueTypeEncoding describes how a field is encoding in a variable.

dqss:BitwiseEncoding Describes how a ValueType is encoded in a variable byte. The start bit and stop bit designate which bits of the byte should be extracted to determine the field value. Endianness of the variable can also be specified.
dqss:InArrayEncoding Describes which layers of the sampling dimension are used to represent the field.

note - should we change ValueType to FieldType? That would make this FieldTypeEncoding. We could also change this to simply FieldEncoding for simplicity.

note 2 - should we change dqss:InArrayEncoding to dqss:SamplingDimensionEncoding?

An Overview of Data Field Types and their Semantic Interpretation

...

ValueType

A ValueType describes the data type of the data field.

EnumeratedValueType

An EnumeratedValueType is a specialization of ValueType that describes an enumerated type. Valid values for the enumeration are described by the EnumeratedValue class and are referenced by the hasMember property.

For example, from the MODIS Atmosphere QA documentation, the Aerosol Parameters Confidence Flag describes a Data Field whose type is an enumeration with 4 defined elements, which correspond to the quality interpretations "Very Good", "Good", "Marginal", and "No Confidence".

ValueTypeEnumerator

A ValueTypeEnumerator is an element of a EnumerationValueType. A common specialization of ValueTypeEnumerator is ValueTypeMapping, which describes a mapping between a ValueInterpretation (semantic concept) and the field encoding for that interpretation.

For example, from the MODIS QA documentation, The Aerosol Parameters Confidence Flag has 4 possible bit values (0-3) with which correspond in order to the confidence definitions/interpretations; "No Confidence", "Marginal", "Good", and "Very Good".

Value Interpretation

Value interpretations are a semantic description of the information contained in the data fields.

Example value interpretations include:

  • QualityLevel (was QualityInterpretation in dqss v2)
  • SurfaceType
  • RetrievalCondition

An Overview of Quality Views and Assertion Expressions

...

Quality Views

A quality view references a screening assertion that should be used to determine whether a corresponding pixel in a data field should be screened.

note - should we the quality view explicitly reference an expression, (it currently explicitly references a screening assertion) and screening assertions are only the leaves in the conditional expression?

Screening Assertions

A screening assertion describes a condition that represents one factor that is used to help determine whether a pixel in the quality view's data field should be screened. Screening assertions describe a constraint against a screening field that can be resolved to be true or false.

Examples of screening assertion constraints are:

  • hasMinimumQualityLevel - does the screening field pixel interpretation meet or exceed the specified quality level?
  • hasSurfaceType- does the screening field pixel interpretation equal this surface type?
  • has[Minimum|Maximum]Threshold - does the screening field pixel value fall within the defined threshold?
  • hasRetrievalCondition - does the screening field pixel interpretation equal this retrieval condition?

note - should we change has[Minimum|Maximum]Threshold to meets[Minimum|Maximum]Threshold in v3?

Expressions

Screening assertions may be combined in a compound conditional expression that allows screening decisions to be based on a combination of complex factors.

if((assertion1 && assertion2) || (assertion3 && assertion4) || ... ) : SCREEN

Worked Example

The following worked examples use the turtle encoding for RDF to describe a quality view and related assertions, fields, data variables, dimension bindings, value types, value type encodings, and value interpretations according to our data quality screening service ontology which can be found at https://scm.escience.rpi.edu/svn/public/ontologies/DQSS/trunk/dqss.owl. We use dqss as the namespace for this ontology in the RDF examples.

For this example I will use mod04_L2 as the namespace of the MODIS Terra Collection 005 and 005 QA information.

Data Field Encoding

First, we define dimensions that will be associated with the variable.


mod04_L2:cell_along_swath a dqss:CoordinateDimension ;
	dqss:identifier "cell_along_swath"^^xsd:string ;
	dqss:spatialResolution "10km"^^xsd:string ;
	dqss:length "204"^^xsd:nonNegativeInteger .

mod04_L2:cell_across_swath a dqss:CoordinateDimension ;
	dqss:identifier "cell_across_swath"^^xsd:string ;
	dqss:spatialResolution "10km"^^xsd:string ;
	dqss:length "135"^^xsd:nonNegativeInteger .

mod04_L2:qa_byte_land a dqss:SamplingDimension ;
	dqss:identifier "qa_byte_land"^^xsd:string ;
	dqss:length "5"^^xsd:nonNegativeInteger .

Next, we define the data variables that contain fields that will be referenced in our quality view.

The data variable that our quality view will describe.


mod04_L2:optical_depth_land_and_ocean a dqss:DataVariable ;
	dqss:identifier "Optical_Depth_Land_And_Ocean"^^xsd:string ;
	rdfs:label "Aerosol Optical Depth 550 nm"^^xsd:string ;
	dqss:spatialResolution "10x10km"^^xsd:string ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:index "0"^^xsd:nonNegativeInteger ;
		dqss:boundDimension mod04_L2:cell_along_swath ;
	] ; 
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:index "1"^^xsd:nonNegativeInteger ;
		dqss:boundDimension mod04_L2:cell_across_swath ;
	] .

Then we define the data variables that contain our screening fields.


mod04_L2:quality_assurance_land a dqss:ScreeningVariable ;
	dqss:identifier "Quality_Assurance_Land"^^xsd:string ;
	dqss:spatialResolution "10x10km grid"^^xsd:string ;
	dqss:processingMode "Daytime only"^^xsd:string ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_along_swath ;
		dqss:index "0"^^xsd:nonNegativeInteger ;
	] ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_across_swath ;
		dqss:index "1"^^xsd:nonNegativeInteger ;
	] ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:qa_byte_land ;
		dqss:index "2"^^xsd:nonNegativeInteger ;
	] .

mod04_L2:cloud_mask_qa a dqss:ScreeningVariable ;
	dqss:identifier "Cloud_Mask_QA"^^xsd:string ;
	dqss:spatialResolution "10x10km"^^xsd:string ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_along_swath ;
		dqss:index "0"^^xsd:nonNegativeInteger ;
	] ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_across_swath ;
		dqss:index "1"^^xsd:nonNegativeInteger ;
	] .

We define the dqss:DataField for Aerosol Optical Thickness:

note: The mod04_l2:standard_field_value_type_encoding means that no subsetting/masking of the data variable is required to extract the field content. All bits and dimensions of the data variable represent field content.


mod04_L2:optical_depth_land_and_ocean_aerosol_optical_depth a dqss:DataField ;
	rdfs:label "Aerosol Optical Depth 550 nm"^^xsd:string ;
	dqss:spatialResolution "10x10km"^^xsd:string ;
	dqss:isRepresentedInVariable mod04_L2:optical_depth_land_and_ocean .
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:index "0"^^xsd:nonNegativeInteger ;
		dqss:boundDimension mod04_L2:cell_along_swath ;
	] ; 
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:index "1"^^xsd:nonNegativeInteger ;
		dqss:boundDimension mod04_L2:cell_across_swath ;
	] ;
	dqss:hasFieldValueTypeEncoding mod04_L2:standard_field_value_type_encoding .

Now we define two screenings fields. The dqss:FieldValueTypeEncoding describes how the field is encoded within the variable.

  • The dqss:InArrayEncoding class specifies that there is a sampling dimension that is orthogonal to the data field dimensions and the start and stop layer index specify which indices of the sampling dimension are used to encode the field.
  • The dqss:BitwiseEncoding class describes which bits are used to encode the field and the endianness of the bitwise encoding.

The 0.66µm Aerosol Optical Thickness Confidence DataField from the Quality_Assuarance_Land Quality Variable for MODIS Collection 005 and 051.


mod04_L2:mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field a dqss:ScreeningField ;
	rdfs:label "Quality_Assurance_Land - 0.66µm Aerosol Optical Thickness Confidence"^^xsd:string ;
	dqss:spatialResolution "10x10km"^^xsd:string ;
	dqss:isRepresentedInVariable mod04_L2:quality_assurance_land ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_along_swath ;
		dqss:index "0"^^xsd:nonNegativeInteger ;
	] ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_across_swath ;
		dqss:index "1"^^xsd:nonNegativeInteger ;
	] ;
	dqss:hasFieldValueTypeEncoding [
		a dqss:InArrayEncoding , dqss:BitwiseEncoding ;
		dqss:hasSamplingDimension mod04_L2:qa_byte_land ;
		dqss:startLayerIndex "0"^^xsd:nonNegativeInteger ;
		dqss:stopLayerIndex "0"^^xsd:nonNegativeInteger ;
		dqss:startBitIndex "5"^^xsd:nonNegativeInteger ;
		dqss:stopBitIndex "7"^^xsd:nonNegativeInteger ;
		dqss:isBigEndian "true"^^xsd:boolean ;
	] .

mod04:L2:cloud_mask_qa_surface_type_field a dqss:ScreeningField ;
	rdfs:label "Cloud_Mask_QA - Surface Type"^^xsd:string ;
	dqss:isRepresentedInVariable mod04_L2:cloud_mask_qa ;
	dqss:spatialResolution "10x10km"^^xsd:string ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_along_swath ;
		dqss:index "0"^^xsd:nonNegativeInteger ;
	] ;
	dqss:boundDimensionBinding [
		a dqss:DimensionBinding ;
		dqss:boundDimension mod04_L2:cell_across_swath ;
		dqss:index "1"^^xsd:nonNegativeInteger ;
	] ;
	dqss:hasFieldValueTypeEncoding [
		a dqss:BitwiseEncoding ;
		dqss:startBitIndex "5"^^xsd:nonNegativeInteger ;
		dqss:stopBitIndex "6"^^xsd:nonNegativeInteger ;
		dqss:isBigEndian "true"^^xsd:boolean ;
	] .

Data Field Semantics

Here we will define the field types for our data and screening fields, describing their elements and how element values map to semantic concepts such as surface type, quality interpretations (e.g. very good, good, marginal, no confidence), and retrieval conditions.

We define an enumeration for confidence and specify valid mappings between element values and their interpretation. (e.g. "0" → "No Confidence")


mod04_L2:quality_assurance_land_confidence_enumeration a dqss:QualityLevelEnumeration ;
	dqss:improvingQualityInPositiveDirection "true"^^xsd:boolean ;
	dqss:hasMember [
		a dqss:QualityLevelMapping ;
		dqss:hasInterpretation mod04_L2:marginal ;
		dqss:valueEncoding "1"^^xsd:integer ;
	] ;
	dqss:hasMember [
		a dqss:QualityLevelMapping ;
		dqss:hasInterpretation mod04_L2:very_good ;
		dqss:valueEncoding "3"^^xsd:integer ;
	] ;
	dqss:hasMember [
		a dqss:QualityLevelMapping ;
		dqss:hasInterpretation mod04_L2:good ;
		dqss:valueEncoding "2"^^xsd:integer ;
	] ;
	dqss:hasMember [
		a dqss:QualityLevelMapping ;
		dqss:hasInterpretation mod04_L2:no_confidence ;
		dqss:valueEncoding "0"^^xsd:integer ;
	] .

We define an enumeration for surface type and specify valid mappings between element values and their interpretation. (e.g. "0" → "Ocean")


mod04_L2:cloud_mask_qa_surface_type_enumeration a dqss:SurfaceTypeEnumeration ;
	dqss:hasMember [
		a dqss:SurfaceTypeMapping ;
		rdfs:comment ">= 90% ocean or deep lakes and rivers"^^xsd:string ;
		dqss:hasInterpretation mod04_L2:ocean ;
		dqss:valueEncoding "0"^^xsd:integer ;
	] ;
	dqss:hasMember [
		a dqss:SurfaceTypeMapping ;
		rdfs:comment "Desert (100% desert)"^^xsd:string ;
		dqss:hasInterpretation mod04_L2:coast ;
		dqss:valueEncoding "1"^^xsd:integer ;
	] ;
	dqss:hasMember [
		a dqss:SurfaceTypeMapping ;
		rdfs:comment "Coast (other criteria not met)"^^xsd:string ;
		dqss:hasInterpretation mod04_L2:desert ;
		dqss:valueEncoding "2"^^xsd:integer ;
	] ;
	dqss:hasMember [
		a dqss:SurfaceTypeMapping ;
		rdfs:comment "Land (100% and < 100% desert)"^^xsd:string ;
		dqss:hasInterpretation mod04_L2:land ;
		dqss:valueEncoding "3"^^xsd:integer ;
	] .

We define the 4 quality levels (specializations of value interpretation) used in Quality_Assurance_Land confidence field.


mod04_L2:no_confidence a dqss:QualityLevel ;
	rdfs:label "No Confidence"^^xsd:string .

mod04_L2:marginal a dqss:QualityLevel ;
	rdfs:label "Marginal"^^xsd:string .

mod04_L2:good a dqss:QualityLevel ;
	rdfs:label "Good"^^xsd:string .

mod04_L2:very_good a dqss:QualityLevel ;
	rdfs:label "Very Good"^^xsd:string .

We define the 4 surface types (specializations of value interpretation) used in the Cloud_Mask_QA surface type field.


mod04_L2:ocean a dqss:SurfaceType ;
	rdfs:label "Ocean"^^xsd:string .

mod04_L2:desert a dqss:SurfaceType ;
	rdfs:label "Desert"^^xsd:string .

mod04_L2:coast a dqss:SurfaceType ;
	rdfs:label "Coast"^^xsd:string .

mod04_L2:land a dqss:SurfaceType ;
	rdfs:label "Land"^^xsd:string .

The field value type descriptions are associated with the appropriate screening fields.


mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field
	dqss:hasFieldTypeValue mod04_L2:quality_assurance_land_confidence_enumeration .

mod04_l2:cloud_mask_qa_surface_type_field 
	dqss:hasFieldTypeValue mod04_L2:cloud_mask_qa_surface_type_enumeration .

Quality Views

Quality Views describe a combination of screening tests (called screening assertions) that should be applied to determine if a pixel in a data field should be screened. The tests described by the assertions in a quality view are applied on a per-pixel basis, as a result all fields (data and screening) referenced in a quality view should have the same dimensional bindings.

Screening Fields describe a test on a screening field. A screening field may be referenced by multiple screening assertions in a quality view, but only one screening field should be referenced in any screening assertion.

There are currently 4 different specialized classes of Screening Assertions as mandated by the current DQSS use cases, they are:

dqss:MinimumQualityInterpretationScreeningAssertion The assertion's screening field should have a dqss:QualityLevelEnumeration for the field value type. The assertion test is that the pixel values in the data field should correspond to a quality interpretation that matches or improves on a designated minimum quality interpretation.
dqss:InvalidThresholdScreeningAssertion TODO
dqss:SurfaceTypeScreeningAssertion The assertion's screening field should have a dqss:SurfaceTypeEnumeration for the field value type. The assertion test is that the pixel values in the data field should correspond to a surface type interpretation designated in the assertion.
dqss:RetrievalConditionScreeningAssertion The assertion's screening field should have a dqss:RetrievalConditionEnumeration for the field value type. The assertion test is that the pixel values in the data field should correspond to a retrieval condition interpretation designated in the assertion.

Screening assertions are combined into a single conditional expression using the AND (&&) and OR (||) conditional logic operators.

if((assertion1 && assertion2) || (assertion3 && assertion4) || ... ) : SCREEN

To facilitate encoding of complex conditional logic expressions, we have developed a simple conditional logic expression ontology, which is available at https://scm.escience.rpi.edu/svn/public/ontologies/DQSS/trunk/expr.owl and which we reference with the namespace expr.

Here is an example in pseudocode of a simple quality view that uses conditional expressions:

foreach pixel in Optical_Depth_Land_And_Ocean {
  if(((Cloud_Mask_QA→SufaceTypeFlag == Ocean) 
      && (Quality_Assurance_Ocean→0.66micronAerosolOpticalThicknessConfidenceFlag < Marginal)) 
  || ((Cloud_Mask_QA→SufaceTypeFlag == Land) 
      && (Quality_Assurance_Land→0.66micronAerosolOpticalThicknessConfidenceFlag < VeryGood))) 
      : screen(pixel)
}

Here is the quality view described above in pseudocode encoded in RDF.


mod04_L2:optical_depth_land_and_ocean_quality_view a dqss:QualityView ;
	rdfs:label "Quality View MOD04 AOD Dark Target"^^xsd:string ;
	dqss:hasDataField [
		a dqss:DataField ;
		dqss:isRepresentedInVariable
			mod04_L2:optical_depth_land_and_ocean ;
	] ;
	dqss:hasScreeningAssertion [
		a expr:Expression, dqss:ScreeningAssertion ;
		expr:op1 [
			a expr:Expression ;
			expr:op1 [
				a expr:Expression, dqss:SurfaceTypeScreeningAssertion ;
				dqss:hasScreeningField
					mod04_L2:cloud_mask_qa_surface_type_field ;
				dqss:hasSurfaceType mod04_L2:ocean ;
			] ;
			expr:op2 [
				a expr:Expression, dqss:ScreeningAssertion ;
				dqss:hasScreeningField
					mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field ;
				dqss:hasMinimumQualityLevel mod04_L2:marginal ;
			] ;
			expr:operator expr:AND ;
		] ;
		expr:op2 [
			a expr:Expression ;
			expr:op1 [
				a expr:Expression, dqss:SurfaceTypeScreeningAssertion ;
				dqss:hasScreeningField
					mod04_L2:cloud_mask_qa_surface_type_field ;
				dqss:hasSurfaceType mod04_L2:land ;
			] ;
			expr:op2 [
				a expr:Expression, dqss:ScreeningAssertion ;
				dqss:hasScreeningField
					mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field ;
				dqss:hasMinimumQualityLevel mod04_L2:very_good ;
			] ;
			expr:operator expr:AND ;
		] ;
		expr:operator expr:OR ;
	] .


Frequently Asked Questions

...