Product Quality Model Primer

Introduction

The Data Quality Model is used to describe how quality assertions can be made for things.

Quality

Quality has many definitions. Our project started with the following definition of product quality.

def: a measure of how well we believe a dataset represents the physical quantity that it purports to. As such, it is closely related to (though not identical to) the level of validation of the dataset. It often varies within the dataset, with dependencies on such factors as viewing geometry, surface type (land, ocean, desert, etc.) and cloud fraction.

An extremely generic definition:

def: a measure of the fitness-for-use of a thing

Quality is extremely complex, and is often partitioned along aspects to make measures of quality easier to represent and use. Examples of quality aspects for data products include bias, completeness, consistency, accuracy, representativeness, etc. Each of these aspects of quality may affect the fitness-for-use of a data product and a measure of each is computed through different means and expressed differently.

We can extend our generic definition to include quality aspects:

def: a set of measures that describe different aspects of the fitness-for-use of a thing.

There are many definitions for quality not expressed here. Our goal is not to have a exhaustive analysis of perceptions on what quality is, nor to develop a model that addresses all definitions of quality in usage, but to arrive at a generic definition and model for quality that meets our project requirements. That model is described in the next section and an example encoding using RDF is described in the section following.


Intuitive overview of Data Quality Model

This section provides an intuitive overview of the concepts and relationships defined in our Data Quality model.

The described model represents a generic view of quality, and we believe is equally attributable to service quality, product quality, dataset or granule quality, pixel-level quality, and potentially information quality.

An encoding of this model as an OWL Ontology is available at http://escience.rpi.edu/ontology/sesf/dq/1/0/dq.owl. This is the encoding referenced in the Worked Example sections below.

Data Entity

Any thing about which quality evidence can be computed and quality assertions can be made.

Data can be at different types of granularity, representing collections, granules, or atomic elements.

Quality Aspect

A quality aspect is a simple representation of a characteristic of a data entity that affects fitness-for-use.

Examples include completeness, consistency, accuracy, representativeness, bias, etc.

Quality Indicator

A quality indicator is a property that can be computed or estimated to provide evidence for a quality assertion.

The slope of a trend-line from a comparison of truth vs estimated values can be a quality indicator for bias. The computed value of the trend-line slope (quality evidence) can be used to make a assertion about bias for the data entity.

Quality Evidence

Quality evidence is the quantification of a quality indicator for some data entity.

Quality Expression

Intuitive and ready-to-use expressions of quality (e.g. bad, marginal, good, very good)

Quality Assertion

A quality assertion is an event whereby a quality expression for some quality aspect is asserted for a data entity based on quality evidence computed from that data entity.

A quality assertion may reference a function to describe how assertion expressions are decided based on quality evidence.

For example, based on a slope of linear-regression fit (indicator) of 1.3 (evidence), a data product (data entity) may be asserted to have a bias (quality aspect) overestimation of 30% (bias-specific quality expression).

Quality Functions

A quality function is used to describe a procedure referenced in the data quality model. There are two specializations of quality functions used in the model.

  • A quality assertion function is used to describe how a quality expression is determined from available quality evidence for a given quality aspect.
  • A quality evidence function is used to describe how quality evidence is computed from a data entity for a quality indicator.


Worked Examples

Data Entity

Suggested example: Data entity to describe data grouping defined by Aerosol Optical Depth data (variable Optical_Depth_Land_And_Ocean_Mean) from MODIS Terra with QA 'very good', low aerosol loading (AOD < 0.2), and a scattering angle < 170 degrees. This data entity is from Table 1a & 1b in An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals (Hyer, 2011).

We define predicates to describe our data entity using domain-relevant constructs.


ex:instrument a owl:ObjectProperty .
ex:variable a owl:ObjectProperty .
ex:algorithm a owl:ObjectProperty .
ex:qualityAssessmentFilter a owl:ObjectProperty .
ex:aerosolLoading a owl:ObjectProperty .
ex:scatteringAngle a owl:DataTypeProperty .

A class and instances to describe aerosol loading.


ex:AerosolLoading a owl:Class .
ex:lowAerosolLoading a ex:AerosolLoading ;
	rdfs:label "low"^^xsd:string ;
	rdfs:description "AOD < 0.2"^^xsd:string .

A class and instance to describe quality levels, defined by quality assessment processing. This quality level class is a subclass of dq:QualityExpression, introduced later.


ex:QualityLevel a owl:Class ;
	rdfs:subClassOf dq:QualityExpression .
	
ex:qa_very_good a ex:QualityLevel ;
	rdfs:label "very good"^^xsd:string .

one example of a data entity. This data entity is determined based on information in Hyer table 1. Season and Region information is not in this table.


ex:exampleDataEntity a dq:DataEntity ;
	ex:instrument mdsa:MODIS.Terra ;
	ex:variable mdsa:Optical_Depth_Land_And_Ocean_Mean ;
	ex:algorithm mdsa:darkTargetLand ;
	ex:qualityAssessmentFilter ex:qa_very_good ;
	ex:aerosolLoading ex:lowAerosolLoading ;
	ex:scatteringAngle "< 170 degrees"^^xsd:string .	

Note - scatteringAngle is left as a DataTypeProperty for now because I see no benefit yet in making a ScatteringAngle class.

Quality Aspect

Suggested example: bias and compliance are aspects of accuracy which is a specialization of quality aspect.
#TODO - may need some re-wording here.


ex:Accuracy a owl:Class ;
	rdfs:subClassOf dq:QualityAspect .

ex:bias a ex:Accuracy ;
	rdfs:label "bias"^^xsd:string .
	
ex:compliance a ex:Accuracy ;
	rdfs:label "compliance"^^xsd:string .

Quality Indicator

Suggested example: slopeOfMeasuredVsTruth and percentageWithinEE are, respectively, bias and compliance indicators.


ex:slopeOfMeasuredVsTruth a dq:QualityIndicator ;
	rdfs:label "slope of measured vs truth"^^xsd:string ;
	dq:indicatorOfQualityAspect ex:bias .
	
ex:percentageWithinEE a dq:QualityIndicator ;
	rdfs:label "percentage within expected error (EE)"^^xsd:string ;
	dq:indicatorOfQualityAspect ex:compliance .

Quality Evidence

Suggested example: percentage within expected error of "68%" and slope "0.93" computed globally for Terra with QA 'very good' and low aerosol loading (AOD < 0.2).

Quality evidence is used to describe a compliance measurement that has been computed for data that matches our DataEntity.

We define a PercentageWithinEEMeasurement class and associated properties to represent evidence for compliance.


ex:PercentageWithinEEMeasurement a owl:Class .
# TODO - add cardinality restrictions for rdf:value


We encode the compliance quality evidence for the example data entity.


ex:exampleComplianceEvidence a dq:QualityEvidence ;
	rdfs:label "68% within expected error"^^xsd:string ;
	dq:evidenceOfIndicator ex:percentageWithinEE ;
	dq:evidenceForQualityAspect ex:compliance ;
	dq:evidenceForDataEntity ex:exampleDataEntity ;
	dq:hasMeasure [
		a ex:PercentageWithinEEMeasurement ;
		rdf:value "68"^^xsd:float ;
	] .

We can infer the following relationship because dq:describedByQualityEvidence is an inverse property dq:evidenceForDataEntity.


ex:exampleDataEntity dq:describedByQualityEvidence ex:exampleComplianceEvidence .

Quality evidence is also used to describe a slope measurement from the comparison of aeronet AOD (ground truth) vs MODIS AOD (data product estimation) in our data entity.

We define a SlopeMeasurement class that will use rdf:value to specify the slope value


ex:SlopeMeasurement a owl:Class .
# TODO restriction of exactly 1 rdf:value with range xsd:float


We encode the slope measurement quality evidence for the example data entity


ex:exampleBiasEvidence a dq:QualityEvidence ;
	rdfs:label "slope = 0.93"^^xsd:string ;
	dq:evidenceOfIndicator ex:slopeOfMeasuredVsTruth ;
	dq:evidenceForQualityAspect ex:bias ;
	dq:evidenceForDataEntity ex:exampleDataEntity ;
	dq:hasMeasure [
		a ex:SlopeMeasurement ;
		rdf:value "0.93"^^xsd:float ;
	] .

We can infer the following relationship because dq:describedByQualityEvidence is an inverse property dq:evidenceForDataEntity.


ex:exampleDataEntity dq:describedByQualityEvidence ex:exampleBiasEvidence .

Our data entity has now been associated with two different pieces of quality evidence, and more types of quality evidence are described in our source paper (Hyer, 2011).

Quality Expression

Quality expressions are used to define controlled vocabularies that provide an easy to use and subjective expression of quality. The subjective expression of quality is usually determined by available quality evidence which is by definition objective.

For example, in MODIS, pixel-level confidence is expressed using the following controlled vocabulary:

  • no confidence
  • marginal
  • good
  • very good

These values indicate the processing algorithm's 'happiness' with its ability to converge on an computed value for the pixel. There is some objective measurement that is used to determine how the algorithm performed for a given pixel - this value is mapped into a controlled, subjective vocabulary to provide an easy-to-use expression of quality.

A bias expression is comprised of a direction and magnitude. The direction specifies if the estimated value is higher (overestimated) or lower (underestimated) than the actual value. The slope value of the comparison against 'truth' is used to determine if the bias is an overestimate, underestimate, there is no bias. If there is no observable bias based on the evidence, correctEstimate is used as the direction. Magnitude is used to express the extent to which the data entity is biased.


ex:BiasExpression a dq:QualityExpression .
# define cardinality constraints for direction and magnitude

ex:underestimate a ex:BiasDirection .
ex:overestimate a ex:BiasDirection .
ex:correctEstimate a ex:BiasDirection .

ex:veryHighBiasMagnitude a ex:BiasMagnitude .
ex:highBiasMagnitude a ex:BiasMagnitude .
ex:moderateBiasMagnitude a ex:BiasMagnitude .
ex:lowBiasMagnitude a ex:BiasMagnitude .
ex:veryLowBiasMagnitude a ex:BiasMagnitude .

Here is an example bias expression instance based on slope evidence of 0.94 - a very low underestimate.


ex:exampleBiasExpression a ex:BiasExpression ;
	rdfs:label "very low bias underestimate" ;
	ex:direction ex:underestimate ;
	ex:magnitude ex:veryLowBiasMagnitude .


Here is a class and instance for a compliance expression.


ex:ComplianceExpression a dq:QualityExpression .

ex:veryBadCompliance a ex:ComplianceExpression ;
	rdfs:label "very bad compliance"^^xsd:string .
ex:badCompliance a ex:ComplianceExpression ;
	rdfs:label "bad compliance"^^xsd:string .
ex:marginalCompliance a ex:ComplianceExpression ;
	rdfs:label "marginal compliance"^^xsd:string .
ex:goodCompliance a ex:ComplianceExpression ;
	rdfs:label "good compliance"^^xsd:string .
ex:veryGoodCompliance a ex:ComplianceExpression ;
	rdfs:label "very good compliance"^^xsd:string .

Quality Assertion

Now we use a BiasAssertion to associate the DataEntity and QualityEvidence to the BiasExpression.

Here we define a BiasAssertion class


ex:BiasAssertion a owl:Class ;
	rdfs:subClassOf dq:QualityAssertion .
# TODO restriction on type of dq:assertionOfQualityExpression to type ex:BiasExpression
# TODO restriction owl:hasValue on dq:assertionAboutQualityAspect to ex:bias

Here is a the bias assertion for our example data entity.


ex:exampleBiasAssertion a ex:BiasAssertion ;
	dq:assertionAboutQualityAspect ex:bias ;
	dq:assertionForDataEntity ex:exampleDataEntity ;
	dq:assertionBasedOnEvidence ex:exampleBiasEvidence ;
	dq:assertionOfQualityExpression ex:exampleBiasExpression .

Here we define a ComplianceAssertion class


ex:ComplianceAssertion a owl:Class ;
	rdfs:subClassOf dq:QualityAssertion .
# TODO restriction on type of dq:assertionOfQualityExpression to type ex:ComplianceExpression
# TODO restriction owl:hasValue on dq:assertionAboutQualityAspect to ex:compliance

Here is a compliance assertion for our example data entity.


ex:exampleComplianceAssertion a ex:ComplianceAssertion ;
	dq:assertionAboutQualityAspect ex:compliance ;
	dq:assertionForDataEntity ex:exampleDataEntity ;
	dq:assertionBasedOnEvidence ex:exampleComplianceEvidence ;
	dq:assertionOfQualityExpression ex:veryGoodCompliance .

Inverse properties give us the following inferences.


ex:exampleBiasEvidence dq:evidenceForAssertion ex:exampleBiasAssertion .
ex:exampleComplianceEvidence dq:evidenceForAssertion ex:exampleComplianceAssertion .
ex:exampleDataEntity dq:describedByQualityAssertion ex:exampleBiasAssertion .
ex:exampleDataEntity dq:describedByQualityAssertion ex:exampleComplianceAssertion .

Quality Functions

Suggested example: quality evidence functions do describe how slopeOfMeasuredVsTruth and percentageWithinEE are computed and quality assertion functions to describe how assertions of bias and compliance are made based on available evidence.

Quality Evidence Functions

Quality evidence functions describe how the measurement for a quality indicator is measured or otherwise assigned.


ex:slopeMeasuredVsTruthFunction a dq:QualityEvidenceFunction ;
	rdfs:label ""^^xsd:string ;
	rdfs:description ""^^xsd:string .
	
ex:percentageWithinEEFunction a dq:QualityEvidenceFunction ;
	rdfs:label ""^^xsd:string ;
	rdfs:description ""^^xsd:string .

Quality Evidence references the function used to assign it.


ex:exampleBiasEvidence dq:computedByFunction ex:slopeMeasuredVsTruthFunction .
ex:exampleComplianceEvidence dq:computedByFunction ex:percentageWithinEEFunction .

Quality Assertion Functions

Quality assertion functions describe how quality assertions are made based on available quality evidence. The function should contain a description of how a quality expression is mapped to ranges and combinations of quality evidence.

Slope evidence → bias direction mapping

slope Bias Direction
> 1.0 overestimate
1.0 correct estimate
< 1.0 underestimate

Slope evidence → bias magnitude mapping (currently placeholder values)

abs(1.0 - slope) Bias Magnitude
> 0.5 very high
0.25 - 0.5 high
0.1 - 0.25 moderate
0.05 - 0.1 low
0 < 0.05 very low
0 no magnitude

A simple encoding of the bias function with the evidence → expression mappings described in the RDF description.


ex:biasFunction a dq:QualityEvidenceFunction ;
	rdfs:label ""^^xsd:string ;
	rdfs:description ""^^xsd:string .

% within EE evidence → compliance assertion mapping

% within expected error Compliance Expression
< 33% very bad
33% - 50% bad
51% - 60% marginal
61% - 73% good
> 74% very good

A simple encoding of the compliance function with the evidence → expression mapping described in the RDF description.


ex:complianceFunction a dq:QualityEvidenceFunction ;
	rdfs:label ""^^xsd:string ;
	rdfs:description ""^^xsd:string .

Quality Assertions reference the function used to assign a quality expression to a data entity based on the available evidence.


ex:exampleBiasAssertion dq:computedByFunction ex:biasFunction .
ex:exampleComplianceAssertion dq:computedByFunction ex:complianceFunction .


Science Explanation

Suggested example: Define a explanation class that is referenced from the QualityAssertion to provide explanation information.


Frequently Asked Questions

...