Actions

Differences of Calculated Results

From BAWiki



Introduction

For data generated by

Various differences can be computed. Input data can be typically categorized as follows:

  • Category K0: f(x,y,z), time-independent quantities;
  • Category K1: f(x,y,z,t_1), time-dependent quantities, one time step;
  • Category KC: f(x,y,z,t_i), time-dependent quantities, several discrete time steps, constant time step \Delta_t;
  • Category KN: f(x,y,z,t_i), time-dependent quantities, several discrete time steps, varying time step \Delta_t(i).

For geophysical data categories K1, KC and KN are of significance. Examples:

  • Category K1: topography/bathymetry h(x,y,z,t_1) for a specific instant in time;
  • Category KC: water level \eta(x,y,z,t_i) at discrete times t_i with constant time step, e. g. computed by a mathematical model;
  • Category KN: tidal high water \eta^{\rm{HW}}(x,y,z,t_i) for times t_i at non-equidistant time intervals, e.g. derived from a water level time serie.

Definitions

  • reference data r: with respect to r various deviations for f can be evaluated. Typical data are either observational data or computational as well as analysis results for a specific (reference) state (situation);
  • variant data f: can be also either observational data or computational as well as analysis results, for which deviations shall be computed with respect to the reference state. Typically variant data are given for a different period in time (natural variation) or a different state of the system under study.
  • valid operator 1: V(r_i) returns .T. or .F., in dependence whether r_i is valid or invalid. Can be also applied to f_i.
  • valid operator 2: V(r_I,f_i) returns .T. or .F., in dependence whether V(r_i)\land V(f_i) is valid or invalid.
  • integer operator 1: P(r_i) returns 1 if V(r_i) else 0. Similar for f_i.
  • integer operator 2: P(r_i,f_i) returns 1 if V(r_i)\land V(f_i) .T. else 0.

Requirements for the computation of differences

The following requirements must be fulfilled by r and f:

  1. r and f must belong to the same category (see above);
  2. the number of times t_i must be identical for r and f;
  3. for data belonging to category KC constant time steps must coincide \Delta t for r and f;
  4. (physical) dimension as well as meaning must be equivalent for r and f;
  5. r_i (short for r(x,y,z,t_i)) as well as f_i (short for r(x,y,z,t_i)) must be valid data for the same instant i in time; otherwise the dervied results will become invalid.

Computational results

Program NCDELTA can be used to compute all subsequent results. Locations of r are not required to coincide with those of f. Values r are interpolated to locations of f, as long as the geographical distance between the different locations does not exceed R^\max. In case the distance exceeds that limit, no results will be computed. In such a situation an invalid result value will be generated. The follwing results can be computed using NCDELTA.

Ordinary differences

Difference

A result is computed for all times (one value for time-independent data) at all locations (x,y,z):

  1. The difference between f_i and r_i is calculated in case V(r_i,f_i) returns .T.:
    d_i = f_i - r_i, if V(r_i,f_i);
  2. Result will be invalid, if V(r_i,f_i) returns .F.:
    d_i = \rm{invalid} if \lnot V(r_i,f_i).

Results are computed for data belonging to categories K0, K1, KC und KN, which means for all types of data.

Maximum difference

Maximum difference is determined using absolute value (modulus) in combination with sign preservation:

  1. At first all differences d_i will be computed as indicated above;
  2. Out of all valid data index i^\max is determined in such a way that \left|d_i\right| is maximal
    d^\max = d_{i^\max}
    is equal to the maximum difference according to this definition; this value can be negative, positive or zero;
  3. In case all values d_i are invalid d^\max = \rm{invalid} will be set.

Computation is performed for categories KC and KN. A valid d^\max is obtained as long there exists at least one valid difference d_i. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Minimum difference

Minimum difference is determined using absolute value (modulus) in combination with sign preservation:

  1. At first all differences d_i will be computed as indicated above;
  2. Out of all valid data index i^\min is determined in such a way that \left|d_i\right| is minimal
    d^\min = d_{i^\min}
    is equal to the minimum difference according to this definition; this value can be negative, positive or zero;
  3. In case all values d_i are invalid d^\min = \rm{invalid} will be set.

Computation is performed for categories KC and KN. A valid d^\min is obtained as long there exists at least one valid difference d_i. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Mean difference

Mean value is computed for all valid differences:

  1. At first all differences d_i will be computed as indicated above;
  2. From all valid differences the mean value is computed as
    d^{\rm{mit}}=\frac{\sum_{i\in I}P(d_i)d_i}{\sum_{i\in I}P(d_i)};
  3. In case all d_i are invalid d^{\rm{mit}} = \rm{invalid} will be set.

Computation is performed for categories KC and KN. A valid d^{\rm{mit}} is obtained as long there exists at least one valid difference d_i. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Mean deviation

Mean deviation is computed for all valid differences:

  1. At first all differences d_i will be computed as indicated above;
  2. From all valid differences the mean deviation is computed as
    d^{\rm{abw}}=\frac{\sum_{i\in I}P(d_i)\left|d_i\right|}{\sum_{i\in I}P(d_i)};
  3. In case all d_i are invalid d^{\rm{abw}} = \rm{invalid} will be set.

Computation is performed for categories KC and KN. A valid d^{\rm{abw}} is obtained as long there exists at least one valid difference d_i. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Root mean square error (RMSE)

RMSE is computed for all valid differences:

  1. At first all differences d_i will be computed as indicated above;
  2. From all valid differences the mean deviation is computed from the well known definition for RMSE;
  3. In case all d_i are invalid RMSE = invalid will be set.

Computation is performed for categories KC and KN. A valid RMSE is obtained as long there exists at least one valid difference d_i. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Number of valid differences

The number of valid differences may vary between different locations. As a consequence the above mentioned quantities may be computed from data sets of different size:

  1. At first all differences d_i will be computed as indicated above;
  2. The number of valid differences is computed as
    N_{\rm{ord}}=\sum_{i\in I}P(d_i);

Computation is performed for categories KC and KN. In program NCPLOT quantities like maximum difference, minimum difference, mean value as well as mean deviation can be visualised using this variable as a filter. In that way visualizations can be created where the results are shown for points only where e. g. either the maximum number of events or a specific number of events occurred.

Data for a Taylor diagram

Taylor diagrams provide "a concise statistical summary of how well patterns match each other in terms of their correlation, their root-mean-square difference and the ratio of their variances." Additional information such as bias can be added to the conventional Taylor diagram. The Taylor diagram provides a graphical framework that allows a suite of variables from a variety of (say) one or more models or reanalyses to be compared to reference data. The reference data can be observationally based (eg, reanalysis) or to another model or a control run.

Literature and further informations:

  1. Taylor, K. E. (2001), Summarizing multiple aspects of model performance in a single diagram, Journal of Geophysical Research, 106 (D7), 7183–7192, doi: http://dx.doi.org/10.1029/2000JD900719;
  2. http://www-pcmdi.llnl.gov/about/staff/Taylor/CV/Taylor_diagram_primer.htm with a short introduction by K. E. Taylor as well as links to example applications.

Standard deviation for reference data

Standard deviation is going to be computed for all valid reference data:

  1. Mean value for r_i is computed according to
    \bar{r}=\frac{\sum_{i\in I}P(r_i,f_i)r_i}{\sum_{i\in I}P(r_i,f_i)};
  2. If a valid \bar{r} was computed standard deviation is obtained from
    \sigma_r = \sqrt{\frac{\sum_{i\in I}P(r_i,f_i)\left(r_i-\bar{r}\right)^2}{\sum_{i\in I}P(r_i,f_i)}};
  3. In case all P(r_i,f_i) are 0 \sigma_r = \rm{invalid} is set.

Computation is performed for categories KC and KN. A valid \sigma_r is obtained as long as there exists at least one valid difference V(r_i,f_i). Standard deviations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Remark: We apply P(r_i,f_i) instead of P(r_i). This guarantees that all Taylor data are computed for the same data set size at a specific location.

Standard deviation for variant data

Standard deviation is going to be computed for all valid variant data:

  1. Mean value for f_i is computed according to
    \bar{f}=\frac{\sum_{i\in I}P(r_i,f_i)f_i}{\sum_{i\in I}P(r_i,f_i)};
  2. If a valid \bar{f} was computed standard deviation is obtained from
    \sigma_f = \sqrt{\frac{\sum_{i\in I}P(r_i,f_i)\left(f_i-\bar{f}\right)^2}{\sum_{i\in I}P(r_i,f_i)}};
  3. In case all P(r_i,f_i) are 0 \sigma_f = \rm{invalid} is set.

Computation is performed for categories KC and KN. A valid \sigma_f is obtained as long as there exists at least one valid difference V(r_i,f_i). Standard deviations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Remark: We apply P(r_i,f_i) instead of P(r_i). This guarantees that all Taylor data are computed for the same data set size at a specific location.

Correlation

Correlation is computed for r and f:

  1. Mean value \bar{r} for reference data is computed as indicated above;
  2. Mean value \bar{f} for variant data is computed as indicated above;
  3. Standard deviation \sigma_r for reference data is computed as indicated above;
  4. Standard deviation \sigma_f for variant data is computed as indicated above;
  5. Correlation R is given as follows
    R=\frac{\sum_{i\in I}P(r_i,f_i)\left(r_i-\bar{r}\right)\left(f_i-\bar{f}\right)}{\sigma_r\sigma_f\sum_{i\in I}P(r_i,f_i)};
  6. In case all V(r_i,f_i) are invalid R = \rm{invalid} is set.

Computation is performed for categories KC und KN. A valid R is obtained as long as there exists at least one valid difference V(r_i,f_i). Correlations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Pattern RMS

Pattern RMS is computed for r and f:

  1. Mean value \bar{r} for reference data is computed as indicated above;
  2. Mean value \bar{f} for variant data is computed as indicated above;
  3. Pattern RMS E' is given as follows
    E'=\sqrt{\frac{\sum_{i\in I}P(r_i,f_i)\left[\left(r_i-\bar{r}\right)\left(f_i-\bar{f}\right)\right]^2}{\sum_{i\in I}P(r_i,f_i)}};
  4. In case all V(r_i,f_i) are invalid E' = \rm{invalid} is set.

Computation is performed for categories KC und KN. A valid E' is obtained as long as there exists at least one valid difference V(r_i,f_i). Correlations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Deviation of means (Bias)

Deviation of means is computed for \bar{r} and \bar{f}; this quantity is also called bias:

  1. Mean value \bar{r} for reference data is computed as indicated above;
  2. Mean value \bar{f} for variant data is computed as indicated above;
  3. Deviation of means \bar{E} is given as follows
    \bar{E}=\bar{f}-\bar{r};
  4. In case all V(r_i,f_i) are invalid \bar{E} = \rm{invalid} is set;
  5. Overall RMS E can be computed from \bar{E} and E' according to
    E = \sqrt{\bar{E}^2+E'^2};

Computation is performed for categories KC und KN. A valid \bar{E} is obtained as long as there exists at least one valid difference V(r_i,f_i). Biases for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Root mean square error (RMSE) according to Taylor

RMSE is computed according to Taylor (2001, equation 3) from Pattern RMS and Bias. Computation is performed for categories KC and KN. A valid RMSE is obtained as long as there exists at least one valid difference d_i. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Definition of RMSE according to Taylor (2001) is mathematically identical to the standard equation for RMSE but computational results may differ slightly due to numerical roundoff.

Taylor Skill 4

Skill S4 is computed according to Taylor (2001, equation 4) from correlation and normalized variance. Computation is performed for categories KC and KN. A valid S4 is obtained as long as there exists at least one valid difference d_i. S4 can be characterised as follows:

  • S4 = 1.0 indicates perfect fit;
  • S4 = 0.0, if correlation R = -1.0 or variance of the variant tends towards 0.0 or infinity;
  • S4 is linear with respect to R (at constant variance);
  • For zero variance S4 is not defined;
  • Bias between variant and reference has no influence on S4.

S4 punishes deviations in Pattern RMS, and is more tolerant with respect to deviations in R. See Taylor (2001, figure 10).

Taylor Skill 5

Skill S5 according to Taylor (2001, euqation 5) is computed from correlation and normalized variance. Computation is performed for categories KC and KN.

In contrast to S4 deviations with respect to Pattern RMS and correlation are treated (punished) similarly. See Taylor (2001, figure 11).

Number of valid reference data

Number of valid reference data r_i may differ between locations:

  1. The number of valid reference data is given by
    N_r=\sum_{i\in I}P(r_i);

Computation is performed for categories KC und KN. This quantity is of purely informative character and is not truely required for any of the Taylor diagram data.

Number of valid variant data

Number of valid variant data f_i may differ between locations:

  1. The number of valid variant data is given by
    N_f=\sum_{i\in I}P(f_i);

Computation is performed for categories KC and KN. This quantity is of purely informative character and is not truely required for any of the Taylor diagram data.

Number of valid Taylor data

The number of valid Taylor data \bar{r},\bar{f},\sigma_r,\sigma_f,R,E' and \bar{E} may be different from location to location:

  1. The number of valid Taylor data is given by
    N_T=\sum_{i\in I}P(r_i,f_i);

Computation is performed for categories KC and KN. In programs like NCPLOT the above mentioned Taylor data can be filtered using this quantity. This offers the opportunity to present results for e. g. locations where all events have occurred, or for locations where only a specific number of events has occurred.

Median and quantiles

A prerequisite for the computation of all subsequent quantities is that all valid differences d_i have to be sorted in ascending order: all N_{\rm{ord}} valid differences d_i are sorted in ascending order into s_j, with j \in [1:N_{\rm{ord}}]. Subsequently n:=N_{\rm{ord}} will be used for sake of simplicity.

Computation is carried through if n \ge 32 holds.

Median

Median is computed for all valid differences d_i:

  1. if n odd: d_{\rm{Med}} = s_\frac{n+1}{2};
  2. if n even: d_{\rm{Med}} = 0.5\left( s_{\frac{n}{2}}+s_{\frac{n}{2}+1}\right);
  3. in case that there are less than 32 valid data d_{\rm{Med}} = \rm{invalid} is set.

Computation is performed for categories KC and KN. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q01

Quantile p=0.01 is computed for all valid differences d_i. In other words: we obtain a specific value for d_i, which is deceeded by just 1 % of all events but exceeded by 99 % of all events:

  1. if n \cdot p integer: d_{\rm{Q01}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right);
  2. if n \cdot p real: d_{\rm{Q01}} = s_{\lceil n \cdot p \rceil};
  3. in case that there are less than 32 valid data d_{\rm{Q01}} = \rm{invalid} is set.

Computation is performed for categories KC and KN. d_{\rm{Q01}} may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q05

Quantile p=0.05 is computed for all valid differences d_i. In other words: we obtain a specific value for d_i, which is deceeded by 5 % of all events but exceeded by 95 % of all events:

  1. if n \cdot p integer: d_{\rm{Q05}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right);
  2. if n \cdot p real: d_{\rm{Q05}} = s_{\lceil n \cdot p \rceil};
  3. in case that there are less than 32 valid data d_{\rm{Q05}} = \rm{invalid} is set.

Computation is performed for categories KC and KN. d_{\rm{Q05}} may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q95

Quantile p=0.95 is computed for all valid differences d_i. In other words: we obtain a specific value for d_i, which is deceeded by 95 % of all events but exceeded by 5 % of all events:

  1. if n \cdot p integer: d_{\rm{Q95}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right);
  2. if n \cdot p real: d_{\rm{Q95}} = s_{\lceil n \cdot p \rceil};
  3. in case that there are less than 32 valid data d_{\rm{Q95}} = \rm{invalid} is set.

Computation is performed for categories KC and KN. d_{\rm{Q95}} may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q99

Quantile p=0.99 is computed for all valid differences d_i. In other words: we obtain a specific value for d_i, which is deceeded by 99 % of all events but exceeded by just 1 % of all events:

  1. if n \cdot p integer: d_{\rm{Q99}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right);
  2. if n \cdot p real: d_{\rm{Q99}} = s_{\lceil n \cdot p \rceil};
  3. in case that there are less than 32 valid data d_{\rm{Q99}} = \rm{invalid} is set.

Computation is performed for categories KC and KN. d_{\rm{Q99}} may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Other skill definitions

Murphy Skill 4

Literature:

  1. Murphy, Allan H. (1988) "Skill Scores Based on the Mean Square Error and Their Relationship to the Correlation Coefficient". Monthly Weather Review, Dec. 1988, Seiten 2417 - 2424.

Computation is performed for categories KC and KN. Skill S4 according to Murphy (1988, equation 4) can be characterised as follows:

  • 1.0 indicates perfect fit;
  • 0.0 indicates that the mean value of the reference data models the (reference) data as good as the variant data, because both "models" show the same mean square error (MSE);
  • negative skill indicates, that the mean value of the reference data models the (reference) data better than the variant data do;
  • bias is taken into account..

back to Pre- and Postprocessing


Overview