Differences of Calculated Results
From BAWiki
Contents
- 1 Introduction
- 2 Definitions
- 3 Requirements for the computation of differences
- 4 Computational results
- 4.1 Ordinary differences
- 4.2 Data for a Taylor diagram
- 4.2.1 Standard deviation for reference data
- 4.2.2 Standard deviation for variant data
- 4.2.3 Correlation
- 4.2.4 Pattern RMS
- 4.2.5 Deviation of means (Bias)
- 4.2.6 Root mean square error (RMSE) according to Taylor
- 4.2.7 Taylor Skill 4
- 4.2.8 Taylor Skill 5
- 4.2.9 Number of valid reference data
- 4.2.10 Number of valid variant data
- 4.2.11 Number of valid Taylor data
- 4.3 Median and quantiles
- 4.4 Other skill definitions
Introduction
For data generated by
- mathematical models (model results), or
- analysis of calculated results (characteristic numbers), or
- measured data (observational data)
Various differences can be computed. Input data can be typically categorized as follows:
- Category K0: , time-independent quantities;
- Category K1: , time-dependent quantities, one time step;
- Category KC: , time-dependent quantities, several discrete time steps, constant time step ;
- Category KN: , time-dependent quantities, several discrete time steps, varying time step .
For geophysical data categories K1, KC and KN are of significance. Examples:
- Category K1: topography/bathymetry for a specific instant in time;
- Category KC: water level at discrete times with constant time step, e. g. computed by a mathematical model;
- Category KN: tidal high water for times at non-equidistant time intervals, e.g. derived from a water level time serie.
Definitions
- reference data : with respect to various deviations for can be evaluated. Typical data are either observational data or computational as well as analysis results for a specific (reference) state (situation);
- variant data : can be also either observational data or computational as well as analysis results, for which deviations shall be computed with respect to the reference state. Typically variant data are given for a different period in time (natural variation) or a different state of the system under study.
- valid operator 1: returns .T. or .F., in dependence whether is valid or invalid. Can be also applied to .
- valid operator 2: returns .T. or .F., in dependence whether is valid or invalid.
- integer operator 1: returns 1 if else 0. Similar for .
- integer operator 2: returns 1 if .T. else 0.
Requirements for the computation of differences
The following requirements must be fulfilled by and :
- and must belong to the same category (see above);
- the number of times must be identical for and ;
- for data belonging to category KC constant time steps must coincide for and ;
- (physical) dimension as well as meaning must be equivalent for and ;
- (short for ) as well as (short for ) must be valid data for the same instant in time; otherwise the dervied results will become invalid.
Computational results
Program NCDELTA can be used to compute all subsequent results. Locations of are not required to coincide with those of . Values are interpolated to locations of , as long as the geographical distance between the different locations does not exceed . In case the distance exceeds that limit, no results will be computed. In such a situation an invalid result value will be generated. The follwing results can be computed using NCDELTA.
Ordinary differences
Difference
A result is computed for all times (one value for time-independent data) at all locations :
- The difference between and is calculated in case returns .T.:
- , if ;
- Result will be invalid, if returns .F.:
- if .
Results are computed for data belonging to categories K0, K1, KC und KN, which means for all types of data.
Maximum difference
Maximum difference is determined using absolute value (modulus) in combination with sign preservation:
- At first all differences will be computed as indicated above;
- Out of all valid data index is determined in such a way that is maximal
- is equal to the maximum difference according to this definition; this value can be negative, positive or zero;
- In case all values are invalid will be set.
Computation is performed for categories KC and KN. A valid is obtained as long there exists at least one valid difference . During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Minimum difference
Minimum difference is determined using absolute value (modulus) in combination with sign preservation:
- At first all differences will be computed as indicated above;
- Out of all valid data index is determined in such a way that is minimal
- is equal to the minimum difference according to this definition; this value can be negative, positive or zero;
- In case all values are invalid will be set.
Computation is performed for categories KC and KN. A valid is obtained as long there exists at least one valid difference . During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Mean difference
Mean value is computed for all valid differences:
- At first all differences will be computed as indicated above;
- From all valid differences the mean value is computed as
- ;
- In case all are invalid will be set.
Computation is performed for categories KC and KN. A valid is obtained as long there exists at least one valid difference . During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Mean deviation
Mean deviation is computed for all valid differences:
- At first all differences will be computed as indicated above;
- From all valid differences the mean deviation is computed as
- ;
- In case all are invalid will be set.
Computation is performed for categories KC and KN. A valid is obtained as long there exists at least one valid difference . During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Root mean square error (RMSE)
RMSE is computed for all valid differences:
- At first all differences will be computed as indicated above;
- From all valid differences the mean deviation is computed from the well known definition for RMSE;
- In case all are invalid RMSE = invalid will be set.
Computation is performed for categories KC and KN. A valid RMSE is obtained as long there exists at least one valid difference . During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Number of valid differences
The number of valid differences may vary between different locations. As a consequence the above mentioned quantities may be computed from data sets of different size:
- At first all differences will be computed as indicated above;
- The number of valid differences is computed as
- ;
Computation is performed for categories KC and KN. In program NCPLOT quantities like maximum difference, minimum difference, mean value as well as mean deviation can be visualised using this variable as a filter. In that way visualizations can be created where the results are shown for points only where e. g. either the maximum number of events or a specific number of events occurred.
Data for a Taylor diagram
Taylor diagrams provide "a concise statistical summary of how well patterns match each other in terms of their correlation, their root-mean-square difference and the ratio of their variances." Additional information such as bias can be added to the conventional Taylor diagram. The Taylor diagram provides a graphical framework that allows a suite of variables from a variety of (say) one or more models or reanalyses to be compared to reference data. The reference data can be observationally based (eg, reanalysis) or to another model or a control run.
Literature and further informations:
- Taylor, K. E. (2001), Summarizing multiple aspects of model performance in a single diagram, Journal of Geophysical Research, 106 (D7), 7183–7192, doi: http://dx.doi.org/10.1029/2000JD900719;
- http://www-pcmdi.llnl.gov/about/staff/Taylor/CV/Taylor_diagram_primer.htm with a short introduction by K. E. Taylor as well as links to example applications.
Standard deviation for reference data
Standard deviation is going to be computed for all valid reference data:
- Mean value for is computed according to
- ;
- If a valid was computed standard deviation is obtained from
- ;
- In case all are 0 is set.
Computation is performed for categories KC and KN. A valid is obtained as long as there exists at least one valid difference . Standard deviations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.
Remark: We apply instead of . This guarantees that all Taylor data are computed for the same data set size at a specific location.
Standard deviation for variant data
Standard deviation is going to be computed for all valid variant data:
- Mean value for is computed according to
- ;
- If a valid was computed standard deviation is obtained from
- ;
- In case all are 0 is set.
Computation is performed for categories KC and KN. A valid is obtained as long as there exists at least one valid difference . Standard deviations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.
Remark: We apply instead of . This guarantees that all Taylor data are computed for the same data set size at a specific location.
Correlation
Correlation is computed for and :
- Mean value for reference data is computed as indicated above;
- Mean value for variant data is computed as indicated above;
- Standard deviation for reference data is computed as indicated above;
- Standard deviation for variant data is computed as indicated above;
- Correlation is given as follows
- ;
- In case all are invalid is set.
Computation is performed for categories KC und KN. A valid is obtained as long as there exists at least one valid difference . Correlations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.
Pattern RMS
Pattern RMS is computed for and :
- Mean value for reference data is computed as indicated above;
- Mean value for variant data is computed as indicated above;
- Pattern RMS is given as follows
- ;
- In case all are invalid is set.
Computation is performed for categories KC und KN. A valid is obtained as long as there exists at least one valid difference . Correlations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.
Deviation of means (Bias)
Deviation of means is computed for and ; this quantity is also called bias:
- Mean value for reference data is computed as indicated above;
- Mean value for variant data is computed as indicated above;
- Deviation of means is given as follows
- ;
- In case all are invalid is set;
- Overall RMS can be computed from and according to
- ;
Computation is performed for categories KC und KN. A valid is obtained as long as there exists at least one valid difference . Biases for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.
Root mean square error (RMSE) according to Taylor
RMSE is computed according to Taylor (2001, equation 3) from Pattern RMS and Bias. Computation is performed for categories KC and KN. A valid RMSE is obtained as long as there exists at least one valid difference . During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Definition of RMSE according to Taylor (2001) is mathematically identical to the standard equation for RMSE but computational results may differ slightly due to numerical roundoff.
Taylor Skill 4
Skill S4 is computed according to Taylor (2001, equation 4) from correlation and normalized variance. Computation is performed for categories KC and KN. A valid S4 is obtained as long as there exists at least one valid difference . S4 can be characterised as follows:
- S4 = 1.0 indicates perfect fit;
- S4 = 0.0, if correlation R = -1.0 or variance of the variant tends towards 0.0 or infinity;
- S4 is linear with respect to R (at constant variance);
- For zero variance S4 is not defined;
- Bias between variant and reference has no influence on S4.
S4 punishes deviations in Pattern RMS, and is more tolerant with respect to deviations in R. See Taylor (2001, figure 10).
Taylor Skill 5
Skill S5 according to Taylor (2001, euqation 5) is computed from correlation and normalized variance. Computation is performed for categories KC and KN.
In contrast to S4 deviations with respect to Pattern RMS and correlation are treated (punished) similarly. See Taylor (2001, figure 11).
Number of valid reference data
Number of valid reference data may differ between locations:
- The number of valid reference data is given by
- ;
Computation is performed for categories KC und KN. This quantity is of purely informative character and is not truely required for any of the Taylor diagram data.
Number of valid variant data
Number of valid variant data may differ between locations:
- The number of valid variant data is given by
- ;
Computation is performed for categories KC and KN. This quantity is of purely informative character and is not truely required for any of the Taylor diagram data.
Number of valid Taylor data
The number of valid Taylor data and may be different from location to location:
- The number of valid Taylor data is given by
- ;
Computation is performed for categories KC and KN. In programs like NCPLOT the above mentioned Taylor data can be filtered using this quantity. This offers the opportunity to present results for e. g. locations where all events have occurred, or for locations where only a specific number of events has occurred.
Median and quantiles
A prerequisite for the computation of all subsequent quantities is that all valid differences have to be sorted in ascending order: all valid differences are sorted in ascending order into , with . Subsequently will be used for sake of simplicity.
Computation is carried through if holds.
Median
Median is computed for all valid differences :
- if odd: ;
- if even: ;
- in case that there are less than 32 valid data is set.
Computation is performed for categories KC and KN. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Quantile Q01
Quantile is computed for all valid differences . In other words: we obtain a specific value for , which is deceeded by just 1 % of all events but exceeded by 99 % of all events:
- if integer: ;
- if real: ;
- in case that there are less than 32 valid data is set.
Computation is performed for categories KC and KN. may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Quantile Q05
Quantile is computed for all valid differences . In other words: we obtain a specific value for , which is deceeded by 5 % of all events but exceeded by 95 % of all events:
- if integer: ;
- if real: ;
- in case that there are less than 32 valid data is set.
Computation is performed for categories KC and KN. may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Quantile Q95
Quantile is computed for all valid differences . In other words: we obtain a specific value for , which is deceeded by 95 % of all events but exceeded by 5 % of all events:
- if integer: ;
- if real: ;
- in case that there are less than 32 valid data is set.
Computation is performed for categories KC and KN. may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Quantile Q99
Quantile is computed for all valid differences . In other words: we obtain a specific value for , which is deceeded by 99 % of all events but exceeded by just 1 % of all events:
- if integer: ;
- if real: ;
- in case that there are less than 32 valid data is set.
Computation is performed for categories KC and KN. may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Other skill definitions
Murphy Skill 4
Literature:
- Murphy, Allan H. (1988) "Skill Scores Based on the Mean Square Error and Their Relationship to the Correlation Coefficient". Monthly Weather Review, Dec. 1988, Seiten 2417 - 2424.
Computation is performed for categories KC and KN. Skill S4 according to Murphy (1988, equation 4) can be characterised as follows:
- 1.0 indicates perfect fit;
- 0.0 indicates that the mean value of the reference data models the (reference) data as good as the variant data, because both "models" show the same mean square error (MSE);
- negative skill indicates, that the mean value of the reference data models the (reference) data better than the variant data do;
- bias is taken into account..
back to Pre- and Postprocessing