This is the primer for the Data Quality Screening Service (DQSS) ontology.
We provide a quick overview of the model and its core classes and then dive into an example of how to use the ontology to encode a quality view in RDF.
The DQSS ontology can be broken down into 3 major components:
note: we should probably actually split the DQSS ontology into 3 modules based on this breakdown of functionality. -- Stephan
...
Fields describe how a parameter is represented in a variable. Fields have dimensions, just like variables, but fields should not have sampling dimensions because fields only contain one parameter, whereas variables may be used to represent several parameters simultaneously.
...
Variables describe distinct objects within a data file that contain data information.
An extent over which a parameter is represented that is independent of the parameter.
The ValueTypeEncoding describes how a field is encoding in a variable.
| dqss:BitwiseEncoding | Describes how a ValueType is encoded in a variable byte. The start bit and stop bit designate which bits of the byte should be extracted to determine the field value. Endianness of the variable can also be specified. |
| dqss:InArrayEncoding | Describes which layers of the sampling dimension are used to represent the field. |
note - should we change ValueType to FieldType? That would make this FieldTypeEncoding. We could also change this to simply FieldEncoding for simplicity.
note 2 - should we change dqss:InArrayEncoding to dqss:SamplingDimensionEncoding?
...
A ValueType describes the data type of the data field.
An EnumeratedValueType is a specialization of ValueType that describes an enumerated type. Valid values for the enumeration are described by the EnumeratedValue class and are referenced by the hasMember property.
For example, from the MODIS Atmosphere QA documentation, the Aerosol Parameters Confidence Flag describes a Data Field whose type is an enumeration with 4 defined elements, which correspond to the quality interpretations "Very Good", "Good", "Marginal", and "No Confidence".
A ValueTypeEnumerator is an element of a EnumerationValueType. A common specialization of ValueTypeEnumerator is ValueTypeMapping, which describes a mapping between a ValueInterpretation (semantic concept) and the field encoding for that interpretation.
For example, from the MODIS QA documentation, The Aerosol Parameters Confidence Flag has 4 possible bit values (0-3) with which correspond in order to the confidence definitions/interpretations; "No Confidence", "Marginal", "Good", and "Very Good".
Value interpretations are a semantic description of the information contained in the data fields.
Example value interpretations include:
...
A quality view references a screening assertion that should be used to determine whether a corresponding pixel in a data field should be screened.
note - should we the quality view explicitly reference an expression, (it currently explicitly references a screening assertion) and screening assertions are only the leaves in the conditional expression?
A screening assertion describes a condition that represents one factor that is used to help determine whether a pixel in the quality view's data field should be screened. Screening assertions describe a constraint against a screening field that can be resolved to be true or false.
Examples of screening assertion constraints are:
note - should we change has[Minimum|Maximum]Threshold to meets[Minimum|Maximum]Threshold in v3?
Screening assertions may be combined in a compound conditional expression that allows screening decisions to be based on a combination of complex factors.
if((assertion1 && assertion2) || (assertion3 && assertion4) || ... ) : SCREEN
The following worked examples use the turtle encoding for RDF to describe a quality view and related assertions, fields, data variables, dimension bindings, value types, value type encodings, and value interpretations according to our data quality screening service ontology which can be found at https://scm.escience.rpi.edu/svn/public/ontologies/DQSS/trunk/dqss.owl. We use dqss as the namespace for this ontology in the RDF examples.
For this example I will use mod04_L2 as the namespace of the MODIS Terra Collection 005 and 005 QA information.
First, we define dimensions that will be associated with the variable.
mod04_L2:cell_along_swath a dqss:CoordinateDimension ; dqss:identifier "cell_along_swath"^^xsd:string ; dqss:spatialResolution "10km"^^xsd:string ; dqss:length "204"^^xsd:nonNegativeInteger . mod04_L2:cell_across_swath a dqss:CoordinateDimension ; dqss:identifier "cell_across_swath"^^xsd:string ; dqss:spatialResolution "10km"^^xsd:string ; dqss:length "135"^^xsd:nonNegativeInteger . mod04_L2:qa_byte_land a dqss:SamplingDimension ; dqss:identifier "qa_byte_land"^^xsd:string ; dqss:length "5"^^xsd:nonNegativeInteger .
Next, we define the data variables that contain fields that will be referenced in our quality view.
The data variable that our quality view will describe.
mod04_L2:optical_depth_land_and_ocean a dqss:DataVariable ; dqss:identifier "Optical_Depth_Land_And_Ocean"^^xsd:string ; rdfs:label "Aerosol Optical Depth 550 nm"^^xsd:string ; dqss:spatialResolution "10x10km"^^xsd:string ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:index "0"^^xsd:nonNegativeInteger ; dqss:boundDimension mod04_L2:cell_along_swath ; ] ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:index "1"^^xsd:nonNegativeInteger ; dqss:boundDimension mod04_L2:cell_across_swath ; ] .
Then we define the data variables that contain our screening fields.
mod04_L2:quality_assurance_land a dqss:ScreeningVariable ; dqss:identifier "Quality_Assurance_Land"^^xsd:string ; dqss:spatialResolution "10x10km grid"^^xsd:string ; dqss:processingMode "Daytime only"^^xsd:string ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_along_swath ; dqss:index "0"^^xsd:nonNegativeInteger ; ] ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_across_swath ; dqss:index "1"^^xsd:nonNegativeInteger ; ] ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:qa_byte_land ; dqss:index "2"^^xsd:nonNegativeInteger ; ] . mod04_L2:cloud_mask_qa a dqss:ScreeningVariable ; dqss:identifier "Cloud_Mask_QA"^^xsd:string ; dqss:spatialResolution "10x10km"^^xsd:string ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_along_swath ; dqss:index "0"^^xsd:nonNegativeInteger ; ] ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_across_swath ; dqss:index "1"^^xsd:nonNegativeInteger ; ] .
We define the dqss:DataField for Aerosol Optical Thickness:
note: The mod04_l2:standard_field_value_type_encoding means that no subsetting/masking of the data variable is required to extract the field content. All bits and dimensions of the data variable represent field content.
mod04_L2:optical_depth_land_and_ocean_aerosol_optical_depth a dqss:DataField ; rdfs:label "Aerosol Optical Depth 550 nm"^^xsd:string ; dqss:spatialResolution "10x10km"^^xsd:string ; dqss:isRepresentedInVariable mod04_L2:optical_depth_land_and_ocean . dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:index "0"^^xsd:nonNegativeInteger ; dqss:boundDimension mod04_L2:cell_along_swath ; ] ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:index "1"^^xsd:nonNegativeInteger ; dqss:boundDimension mod04_L2:cell_across_swath ; ] ; dqss:hasFieldValueTypeEncoding mod04_L2:standard_field_value_type_encoding .
Now we define two screenings fields. The dqss:FieldValueTypeEncoding describes how the field is encoded within the variable.
The 0.66µm Aerosol Optical Thickness Confidence DataField from the Quality_Assuarance_Land Quality Variable for MODIS Collection 005 and 051.
mod04_L2:mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field a dqss:ScreeningField ; rdfs:label "Quality_Assurance_Land - 0.66µm Aerosol Optical Thickness Confidence"^^xsd:string ; dqss:spatialResolution "10x10km"^^xsd:string ; dqss:isRepresentedInVariable mod04_L2:quality_assurance_land ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_along_swath ; dqss:index "0"^^xsd:nonNegativeInteger ; ] ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_across_swath ; dqss:index "1"^^xsd:nonNegativeInteger ; ] ; dqss:hasFieldValueTypeEncoding [ a dqss:InArrayEncoding , dqss:BitwiseEncoding ; dqss:hasSamplingDimension mod04_L2:qa_byte_land ; dqss:startLayerIndex "0"^^xsd:nonNegativeInteger ; dqss:stopLayerIndex "0"^^xsd:nonNegativeInteger ; dqss:startBitIndex "5"^^xsd:nonNegativeInteger ; dqss:stopBitIndex "7"^^xsd:nonNegativeInteger ; dqss:isBigEndian "true"^^xsd:boolean ; ] . mod04:L2:cloud_mask_qa_surface_type_field a dqss:ScreeningField ; rdfs:label "Cloud_Mask_QA - Surface Type"^^xsd:string ; dqss:isRepresentedInVariable mod04_L2:cloud_mask_qa ; dqss:spatialResolution "10x10km"^^xsd:string ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_along_swath ; dqss:index "0"^^xsd:nonNegativeInteger ; ] ; dqss:boundDimensionBinding [ a dqss:DimensionBinding ; dqss:boundDimension mod04_L2:cell_across_swath ; dqss:index "1"^^xsd:nonNegativeInteger ; ] ; dqss:hasFieldValueTypeEncoding [ a dqss:BitwiseEncoding ; dqss:startBitIndex "5"^^xsd:nonNegativeInteger ; dqss:stopBitIndex "6"^^xsd:nonNegativeInteger ; dqss:isBigEndian "true"^^xsd:boolean ; ] .
Here we will define the field types for our data and screening fields, describing their elements and how element values map to semantic concepts such as surface type, quality interpretations (e.g. very good, good, marginal, no confidence), and retrieval conditions.
We define an enumeration for confidence and specify valid mappings between element values and their interpretation. (e.g. "0" → "No Confidence")
mod04_L2:quality_assurance_land_confidence_enumeration a dqss:QualityLevelEnumeration ; dqss:improvingQualityInPositiveDirection "true"^^xsd:boolean ; dqss:hasMember [ a dqss:QualityLevelMapping ; dqss:hasInterpretation mod04_L2:marginal ; dqss:valueEncoding "1"^^xsd:integer ; ] ; dqss:hasMember [ a dqss:QualityLevelMapping ; dqss:hasInterpretation mod04_L2:very_good ; dqss:valueEncoding "3"^^xsd:integer ; ] ; dqss:hasMember [ a dqss:QualityLevelMapping ; dqss:hasInterpretation mod04_L2:good ; dqss:valueEncoding "2"^^xsd:integer ; ] ; dqss:hasMember [ a dqss:QualityLevelMapping ; dqss:hasInterpretation mod04_L2:no_confidence ; dqss:valueEncoding "0"^^xsd:integer ; ] .
We define an enumeration for surface type and specify valid mappings between element values and their interpretation. (e.g. "0" → "Ocean")
mod04_L2:cloud_mask_qa_surface_type_enumeration a dqss:SurfaceTypeEnumeration ; dqss:hasMember [ a dqss:SurfaceTypeMapping ; rdfs:comment ">= 90% ocean or deep lakes and rivers"^^xsd:string ; dqss:hasInterpretation mod04_L2:ocean ; dqss:valueEncoding "0"^^xsd:integer ; ] ; dqss:hasMember [ a dqss:SurfaceTypeMapping ; rdfs:comment "Desert (100% desert)"^^xsd:string ; dqss:hasInterpretation mod04_L2:coast ; dqss:valueEncoding "1"^^xsd:integer ; ] ; dqss:hasMember [ a dqss:SurfaceTypeMapping ; rdfs:comment "Coast (other criteria not met)"^^xsd:string ; dqss:hasInterpretation mod04_L2:desert ; dqss:valueEncoding "2"^^xsd:integer ; ] ; dqss:hasMember [ a dqss:SurfaceTypeMapping ; rdfs:comment "Land (100% and < 100% desert)"^^xsd:string ; dqss:hasInterpretation mod04_L2:land ; dqss:valueEncoding "3"^^xsd:integer ; ] .
We define the 4 quality levels (specializations of value interpretation) used in Quality_Assurance_Land confidence field.
mod04_L2:no_confidence a dqss:QualityLevel ; rdfs:label "No Confidence"^^xsd:string . mod04_L2:marginal a dqss:QualityLevel ; rdfs:label "Marginal"^^xsd:string . mod04_L2:good a dqss:QualityLevel ; rdfs:label "Good"^^xsd:string . mod04_L2:very_good a dqss:QualityLevel ; rdfs:label "Very Good"^^xsd:string .
We define the 4 surface types (specializations of value interpretation) used in the Cloud_Mask_QA surface type field.
mod04_L2:ocean a dqss:SurfaceType ; rdfs:label "Ocean"^^xsd:string . mod04_L2:desert a dqss:SurfaceType ; rdfs:label "Desert"^^xsd:string . mod04_L2:coast a dqss:SurfaceType ; rdfs:label "Coast"^^xsd:string . mod04_L2:land a dqss:SurfaceType ; rdfs:label "Land"^^xsd:string .
The field value type descriptions are associated with the appropriate screening fields.
mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field dqss:hasFieldTypeValue mod04_L2:quality_assurance_land_confidence_enumeration . mod04_l2:cloud_mask_qa_surface_type_field dqss:hasFieldTypeValue mod04_L2:cloud_mask_qa_surface_type_enumeration .
Quality Views describe a combination of screening tests (called screening assertions) that should be applied to determine if a pixel in a data field should be screened. The tests described by the assertions in a quality view are applied on a per-pixel basis, as a result all fields (data and screening) referenced in a quality view should have the same dimensional bindings.
Screening Fields describe a test on a screening field. A screening field may be referenced by multiple screening assertions in a quality view, but only one screening field should be referenced in any screening assertion.
There are currently 4 different specialized classes of Screening Assertions as mandated by the current DQSS use cases, they are:
| dqss:MinimumQualityInterpretationScreeningAssertion | The assertion's screening field should have a dqss:QualityLevelEnumeration for the field value type. The assertion test is that the pixel values in the data field should correspond to a quality interpretation that matches or improves on a designated minimum quality interpretation. |
| dqss:InvalidThresholdScreeningAssertion | TODO |
| dqss:SurfaceTypeScreeningAssertion | The assertion's screening field should have a dqss:SurfaceTypeEnumeration for the field value type. The assertion test is that the pixel values in the data field should correspond to a surface type interpretation designated in the assertion. |
| dqss:RetrievalConditionScreeningAssertion | The assertion's screening field should have a dqss:RetrievalConditionEnumeration for the field value type. The assertion test is that the pixel values in the data field should correspond to a retrieval condition interpretation designated in the assertion. |
Screening assertions are combined into a single conditional expression using the AND (&&) and OR (||) conditional logic operators.
if((assertion1 && assertion2) || (assertion3 && assertion4) || ... ) : SCREEN
To facilitate encoding of complex conditional logic expressions, we have developed a simple conditional logic expression ontology, which is available at https://scm.escience.rpi.edu/svn/public/ontologies/DQSS/trunk/expr.owl and which we reference with the namespace expr.
Here is an example in pseudocode of a simple quality view that uses conditional expressions:
foreach pixel in Optical_Depth_Land_And_Ocean {
if(((Cloud_Mask_QA→SufaceTypeFlag == Ocean)
&& (Quality_Assurance_Ocean→0.66micronAerosolOpticalThicknessConfidenceFlag < Marginal))
|| ((Cloud_Mask_QA→SufaceTypeFlag == Land)
&& (Quality_Assurance_Land→0.66micronAerosolOpticalThicknessConfidenceFlag < VeryGood)))
: screen(pixel)
}
Here is the quality view described above in pseudocode encoded in RDF.
mod04_L2:optical_depth_land_and_ocean_quality_view a dqss:QualityView ; rdfs:label "Quality View MOD04 AOD Dark Target"^^xsd:string ; dqss:hasDataField [ a dqss:DataField ; dqss:isRepresentedInVariable mod04_L2:optical_depth_land_and_ocean ; ] ; dqss:hasScreeningAssertion [ a expr:Expression, dqss:ScreeningAssertion ; expr:op1 [ a expr:Expression ; expr:op1 [ a expr:Expression, dqss:SurfaceTypeScreeningAssertion ; dqss:hasScreeningField mod04_L2:cloud_mask_qa_surface_type_field ; dqss:hasSurfaceType mod04_L2:ocean ; ] ; expr:op2 [ a expr:Expression, dqss:ScreeningAssertion ; dqss:hasScreeningField mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field ; dqss:hasMinimumQualityLevel mod04_L2:marginal ; ] ; expr:operator expr:AND ; ] ; expr:op2 [ a expr:Expression ; expr:op1 [ a expr:Expression, dqss:SurfaceTypeScreeningAssertion ; dqss:hasScreeningField mod04_L2:cloud_mask_qa_surface_type_field ; dqss:hasSurfaceType mod04_L2:land ; ] ; expr:op2 [ a expr:Expression, dqss:ScreeningAssertion ; dqss:hasScreeningField mod04_L2:quality_assurance_land_0_66_micron_aot_confidence_field ; dqss:hasMinimumQualityLevel mod04_L2:very_good ; ] ; expr:operator expr:AND ; ] ; expr:operator expr:OR ; ] .
...