SPCDIS Working Group - CHIP GBU Processing Provenance Capture

Printer-friendly version

Description of CHIP GBU Rating Process

Five spots on the solar image are used to create a time series for the day, with one spot in the center. Each is correlated with the center spot as a measure of clouds, with the assumption that all five areas will not be affected by the variability of solar activity. If the center spot is determined to have activity in it, another spot is substituted as this reference time series. This is done to find a spot time series not affected by solar activity.

The five spots on the image are fixed for all images. They are five square subsets of the image data, which is summed in each area to give 5 numbers. Each is a box, say 50 pixels on a side, for which all the pixels in the box are summed. These five numbers for each image are then organized as a time series, like a plot of each value as a function of time, through the day. These five plots are cross-correlated with each other.

That time series can then be used at each time step in the day for a simple level comparison to determine whether clouds contaminate that image. The comparison levels are then as follows:

GBU Good: above 600

GBU Bad: between 400 and 600

GBU Ugly: below 400

Example GBU log

space delimited text file has been translated to a wikitable (and given column headers) for better presentation.

Science Image Quality Annotation
(GBU Rating)
Quality Evidence
(Sample Level Comparison)
20100208.233907.chp.hsc.fts Good 789.80 127
20100208.234206.chp.hsc.fts Good 784.60 128
20100208.234506.chp.hsc.fts Good 786.66 129
20100208.234807.chp.hsc.fts Good 790.37 130
20100208.235106.chp.hsc.fts ARCHIVE 791.01 131
20100208.235406.chp.hsc.fts Good 790.87 132
20100208.235707.chp.hsc.fts Ugly 4.33 133
20100209.000007.chp.hsc.fts Good 786.78 134
20100209.000307.chp.hsc.fts Good 783.02 135
20100209.000607.chp.hsc.fts Good 786.47 136
20100209.000907.chp.hsc.fts Good 780.87 137
20100209.001206.chp.hsc.fts Good 786.55 138
20100209.001506.chp.hsc.fts Good 789.46 139
20100209.001806.chp.hsc.fts Ugly 3.32 140
20100209.002106.chp.hsc.fts Ugly 5.60 141
20100209.002412.chp.hsc.fts Good 788.04 142
20100209.002715.chp.hsc.fts Good 780.61 143
20100209.003006.chp.hsc.fts Good 783.58 144

GBU Rating Process Modeled as WDO-It SAW


Issues in Provenance Modeling

Granularity: In modeling CHIP GBU processing, there are two granularities at which provenance can be tracked: at the level of GBU Logs (coarse-grained) and individual cells in those logs (fine-grained). While a fine-grained model will more directly capture the provenance of log information, it will require more provenance to be created/stored than a coarse-grained model. There is another option of storing provenance on GBU status change. We may explore capturing provenance at all 3 levels and then determine how significant the space usage is and also evaluate if the GBU status change granularity is adequate or useful. One question will be if other parameter values change within a single G / B/ or Ugly status situation.
Also, notice the archive value - is that captured as a fourth value?

Current Model Drafts

Coarse Grained: Version 1

Comment 1: In the current PML model, there are 6 distinct places where Inference Engines and Inference Rules are used. I’m not sure whether 6 different Inference Engines will be required (one may be able to handle everything). However, I’m guessing using 6 different Inference Rules will be a good idea, assuming we think of them as being like program functions. In this mode, Inference Engine and Inference Rule nodes with the label '5' appear twice - in the 'Compute Sample Sums' Inference Step for the 1st and nth image - signifying that the same Inference Engine and Rule should be used for all n images.

Comment 2: It is not yet clear which nodes will be timestamped in this model (aside from pmlp:SourceUsage instances). This will be resolved later.


Question 1 (from James): Based on Leonard’s description, I’m still not quite sure what the process is for nailing down the 5 areas to compute sample sums over. Is it something like this?

for(x = 0; x < imageCount; x++)
      if x = 0, select 5 sampling points
      if centerPointHasActivity() == true, re-select 5 points for all images

If this is the case, are images being processed *after* they are gathered from the CHIP data ingest? If they’re instead being processed in real time, how would this process work? In the coarse grained model (V1), the former is assumed.

Comment from Stephan: A good question for Leonard. The earlier CHIP data ingest was not a real-time, or push-based processing system - most processing was done in batches with all data for a day (or longer) run through a stage of processing at one time. I suspect the GBU processing is a two-pass batch system; pass one generates the sample sum for each image and aggregates into the daily time-series and pass two generates the sample level comparison for each image.

Comment from Deborah: We need to consider if we want to have a max cardinality on the GBU value. And then we need to consider if we are going to have a oneof for GBU or actually allow 4 values. We need to check if archive as a value means that the image is good (specifically we need to check if all images are B or U for a day, is there an archive marking on any of the images?

Selective Explanations of Metadata/Provenance in CHIP

Presentation by James (8/4): SPCDIS.ppt


  • MLSO: Mauna Loa Solar Observatory
    • The Mauna Loa Solar Observatory (MLSO) occupies part of the NOAA Mauna Loa research site located on the flank of Mauna Loa at an elevation of 3440 meters on the island of Hawaii. It is operated by the High Altitude Observatory, a division of the National Center for Atmospheric Research, which is located in Boulder, Colorado.
    • see also see also http://mlso.hao.ucar.edu/mlso_about.html
  • ACOS: Advanced Coronal Observing System
    • A suite of instruments designed to observe the solar atmosphere, including the Chromospheric Helium Imaging Photometer (CHIP, 1083.0nm), H-alpha prominence and solar disk monitor (PICS, 656.3nm), and the Mk4 K-coronameter, which observes the white light K-corona from 1.12-2.79 solar radii (700-950 nm).
  • CHIP: Chromospheric Helium-I Imaging Photometer
    • The Chromospheric Helium-I Imaging Photometer (CHIP) was installed at the Observatory in April 1996. CHIP is a differential device using properties of the Helium-I line at 1083 nm as an indicator of both chromospheric and coronal structures. CHIP records images of the sun at 1083 nm, as well as at a number of other nearby wavelengths (for calibration purposes). It is basically composed of a liquid crystal variable retarder (tuneable) Lyot filter connected to an IR CCD.
    • CHIP is unique compared with other Helium-I imagers, in that it obtains images every 3 minutes, the high cadence crucial to study the rapid evolution of CMEs. In addition, observations from CHIP should provide better understanding of coronal holes, coronal arcades, and the interaction between open and closed magnetic field structures.
    • Seven line and continuum exposures are recorded within 2 seconds. The difference of line and one continuum exposures is computed every 3 minutes to produce one 1083 nm image.
    • The CCD pixel size is 2.3 arcseconds and the measured spatial resolution is ~8 arcseconds. For additional information on the tuneable filter see Kopp et al. (1996).
    • see also http://mlso.hao.ucar.edu/mlso_about.html
  • GBU: Good/Bad/Ugly Data Quality Rating
    • Ugly images are not fit for science analysis. Ugly images are not published for use.
    • Bad images can be used for science analysis, but the user should be aware that image quality may be slightly degraded. Bad images are published for use.
    • Good images are fit for science use and are published for use.