Actions

Differences of Calculated Results: Difference between revisions

From BAWiki

imported>Lang Guenther
imported>Lang Guenther
(NOATOLINKS, NOATOLINKSTARGET added)
Line 1: Line 1:
[[de:Differenzen der Berechnungsergebnisse]]
[[de:Differenzen der Berechnungsergebnisse]]
__NOAUTOLINKS__
__NOAUTOLINKTARGET__


==Introduction==
==Introduction==

Revision as of 07:11, 27 August 2018



Introduction

For data generated by

Various differences can be computed. Input data can be typically categorized as follows:

  • Category K0: [math]\displaystyle{ f(x,y,z) }[/math], time-independent quantities;
  • Category K1: [math]\displaystyle{ f(x,y,z,t_1) }[/math], time-dependent quantities, one time step;
  • Category KC: [math]\displaystyle{ f(x,y,z,t_i) }[/math], time-dependent quantities, several discrete time steps, constant time step [math]\displaystyle{ \Delta_t }[/math];
  • Category KN: [math]\displaystyle{ f(x,y,z,t_i) }[/math], time-dependent quantities, several discrete time steps, varying time step [math]\displaystyle{ \Delta_t(i) }[/math].

For geophysical data categories K1, KC and KN are of significance. Examples:

  • Category K1: topography/bathymetry [math]\displaystyle{ h(x,y,z,t_1) }[/math] for a specific instant in time;
  • Category KC: water level [math]\displaystyle{ \eta(x,y,z,t_i) }[/math] at discrete times [math]\displaystyle{ t_i }[/math] with constant time step, e. g. computed by a mathematical model;
  • Category KN: tidal high water [math]\displaystyle{ \eta^{\rm{HW}}(x,y,z,t_i) }[/math] for times [math]\displaystyle{ t_i }[/math] at non-equidistant time intervals, e.g. derived from a water level time serie.

Definitions

  • reference data [math]\displaystyle{ r }[/math]: with respect to [math]\displaystyle{ r }[/math] various deviations for [math]\displaystyle{ f }[/math] can be evaluated. Typical data are either observational data or computational as well as analysis results for a specific (reference) state (situation);
  • variant data [math]\displaystyle{ f }[/math]: can be also either observational data or computational as well as analysis results, for which deviations shall be computed with respect to the reference state. Typically variant data are given for a different period in time (natural variation) or a different state of the system under study.
  • valid operator 1: [math]\displaystyle{ V(r_i) }[/math] returns .T. or .F., in dependence whether [math]\displaystyle{ r_i }[/math] is valid or invalid. Can be also applied to [math]\displaystyle{ f_i }[/math].
  • valid operator 2: [math]\displaystyle{ V(r_I,f_i) }[/math] returns .T. or .F., in dependence whether [math]\displaystyle{ V(r_i)\land V(f_i) }[/math] is valid or invalid.
  • integer operator 1: [math]\displaystyle{ P(r_i) }[/math] returns 1 if [math]\displaystyle{ V(r_i) }[/math] else 0. Similar for [math]\displaystyle{ f_i }[/math].
  • integer operator 2: [math]\displaystyle{ P(r_i,f_i) }[/math] returns 1 if [math]\displaystyle{ V(r_i)\land V(f_i) }[/math] .T. else 0.

Requirements for the computation of differences

The following requirements must be fulfilled by [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math]:

  1. [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math] must belong to the same category (see above);
  2. the number of times [math]\displaystyle{ t_i }[/math] must be identical for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math];
  3. for data belonging to category KC constant time steps must coincide [math]\displaystyle{ \Delta t }[/math] for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math];
  4. (physical) dimension as well as meaning must be equivalent for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math];
  5. [math]\displaystyle{ r_i }[/math] (short for [math]\displaystyle{ r(x,y,z,t_i) }[/math]) as well as [math]\displaystyle{ f_i }[/math] (short for [math]\displaystyle{ r(x,y,z,t_i) }[/math]) must be valid data for the same instant [math]\displaystyle{ i }[/math] in time; otherwise the dervied results will become invalid.

Computational results

Program NCDELTA can be used to compute all subsequent results. Locations of [math]\displaystyle{ r }[/math] are not required to coincide with those of [math]\displaystyle{ f }[/math]. Values [math]\displaystyle{ r }[/math] are interpolated to locations of [math]\displaystyle{ f }[/math], as long as the geographical distance between the different locations does not exceed [math]\displaystyle{ R^\max }[/math]. In case the distance exceeds that limit, no results will be computed. In such a situation an invalid result value will be generated. The follwing results can be computed using NCDELTA.

Ordinary differences

Difference

A result is computed for all times (one value for time-independent data) at all locations [math]\displaystyle{ (x,y,z) }[/math]:

  1. The difference between [math]\displaystyle{ f_i }[/math] and [math]\displaystyle{ r_i }[/math] is calculated in case [math]\displaystyle{ V(r_i,f_i) }[/math] returns .T.:
    [math]\displaystyle{ d_i = f_i - r_i }[/math], if [math]\displaystyle{ V(r_i,f_i) }[/math];
  2. Result will be invalid, if [math]\displaystyle{ V(r_i,f_i) }[/math] returns .F.:
    [math]\displaystyle{ d_i = \rm{invalid} }[/math] if [math]\displaystyle{ \lnot V(r_i,f_i) }[/math].

Results are computed for data belonging to categories K0, K1, KC und KN, which means for all types of data.

Maximum difference

Maximum difference is determined using absolute value (modulus) in combination with sign preservation:

  1. At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
  2. Out of all valid data index [math]\displaystyle{ i^\max }[/math] is determined in such a way that [math]\displaystyle{ \left|d_i\right| }[/math] is maximal
    [math]\displaystyle{ d^\max = d_{i^\max} }[/math]
    is equal to the maximum difference according to this definition; this value can be negative, positive or zero;
  3. In case all values [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^\max = \rm{invalid} }[/math] will be set.

Computation is performed for categories KC and KN. A valid [math]\displaystyle{ d^\max }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Minimum difference

Minimum difference is determined using absolute value (modulus) in combination with sign preservation:

  1. At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
  2. Out of all valid data index [math]\displaystyle{ i^\min }[/math] is determined in such a way that [math]\displaystyle{ \left|d_i\right| }[/math] is minimal
    [math]\displaystyle{ d^\min = d_{i^\min} }[/math]
    is equal to the minimum difference according to this definition; this value can be negative, positive or zero;
  3. In case all values [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^\min = \rm{invalid} }[/math] will be set.

Computation is performed for categories KC and KN. A valid [math]\displaystyle{ d^\min }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Mean difference

Mean value is computed for all valid differences:

  1. At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
  2. From all valid differences the mean value is computed as
    [math]\displaystyle{ d^{\rm{mit}}=\frac{\sum_{i\in I}P(d_i)d_i}{\sum_{i\in I}P(d_i)} }[/math];
  3. In case all [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^{\rm{mit}} = \rm{invalid} }[/math] will be set.

Computation is performed for categories KC and KN. A valid [math]\displaystyle{ d^{\rm{mit}} }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Mean deviation

Mean deviation is computed for all valid differences:

  1. At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
  2. From all valid differences the mean deviation is computed as
    [math]\displaystyle{ d^{\rm{abw}}=\frac{\sum_{i\in I}P(d_i)\left|d_i\right|}{\sum_{i\in I}P(d_i)} }[/math];
  3. In case all [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^{\rm{abw}} = \rm{invalid} }[/math] will be set.

Computation is performed for categories KC and KN. A valid [math]\displaystyle{ d^{\rm{abw}} }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Root mean square error (RMSE)

RMSE is computed for all valid differences:

  1. At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
  2. From all valid differences the mean deviation is computed from the well known definition for RMSE;
  3. In case all [math]\displaystyle{ d_i }[/math] are invalid RMSE = invalid will be set.

Computation is performed for categories KC and KN. A valid RMSE is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Number of valid differences

The number of valid differences may vary between different locations. As a consequence the above mentioned quantities may be computed from data sets of different size:

  1. At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
  2. The number of valid differences is computed as
    [math]\displaystyle{ N_{\rm{ord}}=\sum_{i\in I}P(d_i) }[/math];

Computation is performed for categories KC and KN. In program NCPLOT quantities like maximum difference, minimum difference, mean value as well as mean deviation can be visualised using this variable as a filter. In that way visualizations can be created where the results are shown for points only where e. g. either the maximum number of events or a specific number of events occurred.

Data for a Taylor diagram

Taylor diagrams provide "a concise statistical summary of how well patterns match each other in terms of their correlation, their root-mean-square difference and the ratio of their variances." Additional information such as bias can be added to the conventional Taylor diagram. The Taylor diagram provides a graphical framework that allows a suite of variables from a variety of (say) one or more models or reanalyses to be compared to reference data. The reference data can be observationally based (eg, reanalysis) or to another model or a control run.

Literature and further informations:

  1. Taylor, K. E. (2001), Summarizing multiple aspects of model performance in a single diagram, Journal of Geophysical Research, 106 (D7), 7183–7192, doi: http://dx.doi.org/10.1029/2000JD900719;
  2. http://www-pcmdi.llnl.gov/about/staff/Taylor/CV/Taylor_diagram_primer.htm with a short introduction by K. E. Taylor as well as links to example applications.

Standard deviation for reference data

Standard deviation is going to be computed for all valid reference data:

  1. Mean value for [math]\displaystyle{ r_i }[/math] is computed according to
    [math]\displaystyle{ \bar{r}=\frac{\sum_{i\in I}P(r_i,f_i)r_i}{\sum_{i\in I}P(r_i,f_i)} }[/math];
  2. If a valid [math]\displaystyle{ \bar{r} }[/math] was computed standard deviation is obtained from
    [math]\displaystyle{ \sigma_r = \sqrt{\frac{\sum_{i\in I}P(r_i,f_i)\left(r_i-\bar{r}\right)^2}{\sum_{i\in I}P(r_i,f_i)}} }[/math];
  3. In case all [math]\displaystyle{ P(r_i,f_i) }[/math] are 0 [math]\displaystyle{ \sigma_r = \rm{invalid} }[/math] is set.

Computation is performed for categories KC and KN. A valid [math]\displaystyle{ \sigma_r }[/math] is obtained as long as there exists at least one valid difference [math]\displaystyle{ V(r_i,f_i) }[/math]. Standard deviations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Remark: We apply [math]\displaystyle{ P(r_i,f_i) }[/math] instead of [math]\displaystyle{ P(r_i) }[/math]. This guarantees that all Taylor data are computed for the same data set size at a specific location.

Standard deviation for variant data

Standard deviation is going to be computed for all valid variant data:

  1. Mean value for [math]\displaystyle{ f_i }[/math] is computed according to
    [math]\displaystyle{ \bar{f}=\frac{\sum_{i\in I}P(r_i,f_i)f_i}{\sum_{i\in I}P(r_i,f_i)} }[/math];
  2. If a valid [math]\displaystyle{ \bar{f} }[/math] was computed standard deviation is obtained from
    [math]\displaystyle{ \sigma_f = \sqrt{\frac{\sum_{i\in I}P(r_i,f_i)\left(f_i-\bar{f}\right)^2}{\sum_{i\in I}P(r_i,f_i)}} }[/math];
  3. In case all [math]\displaystyle{ P(r_i,f_i) }[/math] are 0 [math]\displaystyle{ \sigma_f = \rm{invalid} }[/math] is set.

Computation is performed for categories KC and KN. A valid [math]\displaystyle{ \sigma_f }[/math] is obtained as long as there exists at least one valid difference [math]\displaystyle{ V(r_i,f_i) }[/math]. Standard deviations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Remark: We apply [math]\displaystyle{ P(r_i,f_i) }[/math] instead of [math]\displaystyle{ P(r_i) }[/math]. This guarantees that all Taylor data are computed for the same data set size at a specific location.

Correlation

Correlation is computed for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math]:

  1. Mean value [math]\displaystyle{ \bar{r} }[/math] for reference data is computed as indicated above;
  2. Mean value [math]\displaystyle{ \bar{f} }[/math] for variant data is computed as indicated above;
  3. Standard deviation [math]\displaystyle{ \sigma_r }[/math] for reference data is computed as indicated above;
  4. Standard deviation [math]\displaystyle{ \sigma_f }[/math] for variant data is computed as indicated above;
  5. Correlation [math]\displaystyle{ R }[/math] is given as follows
    [math]\displaystyle{ R=\frac{\sum_{i\in I}P(r_i,f_i)\left(r_i-\bar{r}\right)\left(f_i-\bar{f}\right)}{\sigma_r\sigma_f\sum_{i\in I}P(r_i,f_i)} }[/math];
  6. In case all [math]\displaystyle{ V(r_i,f_i) }[/math] are invalid [math]\displaystyle{ R = \rm{invalid} }[/math] is set.

Computation is performed for categories KC und KN. A valid [math]\displaystyle{ R }[/math] is obtained as long as there exists at least one valid difference [math]\displaystyle{ V(r_i,f_i) }[/math]. Correlations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Pattern RMS

Pattern RMS is computed for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math]:

  1. Mean value [math]\displaystyle{ \bar{r} }[/math] for reference data is computed as indicated above;
  2. Mean value [math]\displaystyle{ \bar{f} }[/math] for variant data is computed as indicated above;
  3. Pattern RMS [math]\displaystyle{ E' }[/math] is given as follows
    [math]\displaystyle{ E'=\sqrt{\frac{\sum_{i\in I}P(r_i,f_i)\left[\left(r_i-\bar{r}\right)\left(f_i-\bar{f}\right)\right]^2}{\sum_{i\in I}P(r_i,f_i)}} }[/math];
  4. In case all [math]\displaystyle{ V(r_i,f_i) }[/math] are invalid [math]\displaystyle{ E' = \rm{invalid} }[/math] is set.

Computation is performed for categories KC und KN. A valid [math]\displaystyle{ E' }[/math] is obtained as long as there exists at least one valid difference [math]\displaystyle{ V(r_i,f_i) }[/math]. Correlations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Deviation of means (Bias)

Deviation of means is computed for [math]\displaystyle{ \bar{r} }[/math] and [math]\displaystyle{ \bar{f} }[/math]; this quantity is also called bias:

  1. Mean value [math]\displaystyle{ \bar{r} }[/math] for reference data is computed as indicated above;
  2. Mean value [math]\displaystyle{ \bar{f} }[/math] for variant data is computed as indicated above;
  3. Deviation of means [math]\displaystyle{ \bar{E} }[/math] is given as follows
    [math]\displaystyle{ \bar{E}=\bar{f}-\bar{r} }[/math];
  4. In case all [math]\displaystyle{ V(r_i,f_i) }[/math] are invalid [math]\displaystyle{ \bar{E} = \rm{invalid} }[/math] is set;
  5. Overall RMS [math]\displaystyle{ E }[/math] can be computed from [math]\displaystyle{ \bar{E} }[/math] and [math]\displaystyle{ E' }[/math] according to
    [math]\displaystyle{ E = \sqrt{\bar{E}^2+E'^2} }[/math];

Computation is performed for categories KC und KN. A valid [math]\displaystyle{ \bar{E} }[/math] is obtained as long as there exists at least one valid difference [math]\displaystyle{ V(r_i,f_i) }[/math]. Biases for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.

Number of valid reference data

Number of valid reference data [math]\displaystyle{ r_i }[/math] may differ between locations:

  1. The number of valid reference data is given by
    [math]\displaystyle{ N_r=\sum_{i\in I}P(r_i) }[/math];

Computation is performed for categories KC und KN. This quantity is of purely informative character and is not truely required for any of the Taylor diagram data.

Number of valid variant data

Number of valid variant data [math]\displaystyle{ f_i }[/math] may differ between locations:

  1. The number of valid variant data is given by
    [math]\displaystyle{ N_f=\sum_{i\in I}P(f_i) }[/math];

Computation is performed for categories KC and KN. This quantity is of purely informative character and is not truely required for any of the Taylor diagram data.

Number of valid Taylor data

The number of valid Taylor data [math]\displaystyle{ \bar{r},\bar{f},\sigma_r,\sigma_f,R,E' }[/math] and [math]\displaystyle{ \bar{E} }[/math] may be different from location to location:

  1. The number of valid Taylor data is given by
    [math]\displaystyle{ N_T=\sum_{i\in I}P(r_i,f_i) }[/math];

Computation is performed for categories KC and KN. In programs like NCPLOT the above mentioned Taylor data can be filtered using this quantity. This offers the opportunity to present results for e. g. locations where all events have occurred, or for locations where only a specific number of events has occurred.

Median and quantiles

A prerequisite for the computation of all subsequent quantities is that all valid differences [math]\displaystyle{ d_i }[/math] have to be sorted in ascending order: all [math]\displaystyle{ N_{\rm{ord}} }[/math] valid differences [math]\displaystyle{ d_i }[/math] are sorted in ascending order into [math]\displaystyle{ s_j }[/math], with [math]\displaystyle{ j \in [1:N_{\rm{ord}}] }[/math]. Subsequently [math]\displaystyle{ n:=N_{\rm{ord}} }[/math] will be used for sake of simplicity.

Computation is carried through if [math]\displaystyle{ n \ge 32 }[/math] holds.

Median

Median is computed for all valid differences [math]\displaystyle{ d_i }[/math]:

  1. if [math]\displaystyle{ n }[/math] odd: [math]\displaystyle{ d_{\rm{Med}} = s_\frac{n+1}{2} }[/math];
  2. if [math]\displaystyle{ n }[/math] even: [math]\displaystyle{ d_{\rm{Med}} = 0.5\left( s_{\frac{n}{2}}+s_{\frac{n}{2}+1}\right) }[/math];
  3. in case that there are less than 32 valid data [math]\displaystyle{ d_{\rm{Med}} = \rm{invalid} }[/math] is set.

Computation is performed for categories KC and KN. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q01

Quantile [math]\displaystyle{ p=0.01 }[/math] is computed for all valid differences [math]\displaystyle{ d_i }[/math]. In other words: we obtain a specific value for [math]\displaystyle{ d_i }[/math], which is deceeded by just 1 % of all events but exceeded by 99 % of all events:

  1. if [math]\displaystyle{ n \cdot p }[/math] integer: [math]\displaystyle{ d_{\rm{Q01}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right) }[/math];
  2. if [math]\displaystyle{ n \cdot p }[/math] real: [math]\displaystyle{ d_{\rm{Q01}} = s_{\lceil n \cdot p \rceil} }[/math];
  3. in case that there are less than 32 valid data [math]\displaystyle{ d_{\rm{Q01}} = \rm{invalid} }[/math] is set.

Computation is performed for categories KC and KN. [math]\displaystyle{ d_{\rm{Q01}} }[/math] may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q05

Quantile [math]\displaystyle{ p=0.05 }[/math] is computed for all valid differences [math]\displaystyle{ d_i }[/math]. In other words: we obtain a specific value for [math]\displaystyle{ d_i }[/math], which is deceeded by 5 % of all events but exceeded by 95 % of all events:

  1. if [math]\displaystyle{ n \cdot p }[/math] integer: [math]\displaystyle{ d_{\rm{Q05}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right) }[/math];
  2. if [math]\displaystyle{ n \cdot p }[/math] real: [math]\displaystyle{ d_{\rm{Q05}} = s_{\lceil n \cdot p \rceil} }[/math];
  3. in case that there are less than 32 valid data [math]\displaystyle{ d_{\rm{Q05}} = \rm{invalid} }[/math] is set.

Computation is performed for categories KC and KN. [math]\displaystyle{ d_{\rm{Q05}} }[/math] may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q95

Quantile [math]\displaystyle{ p=0.95 }[/math] is computed for all valid differences [math]\displaystyle{ d_i }[/math]. In other words: we obtain a specific value for [math]\displaystyle{ d_i }[/math], which is deceeded by 95 % of all events but exceeded by 5 % of all events:

  1. if [math]\displaystyle{ n \cdot p }[/math] integer: [math]\displaystyle{ d_{\rm{Q95}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right) }[/math];
  2. if [math]\displaystyle{ n \cdot p }[/math] real: [math]\displaystyle{ d_{\rm{Q95}} = s_{\lceil n \cdot p \rceil} }[/math];
  3. in case that there are less than 32 valid data [math]\displaystyle{ d_{\rm{Q95}} = \rm{invalid} }[/math] is set.

Computation is performed for categories KC and KN. [math]\displaystyle{ d_{\rm{Q95}} }[/math] may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.

Quantile Q99

Quantile [math]\displaystyle{ p=0.99 }[/math] is computed for all valid differences [math]\displaystyle{ d_i }[/math]. In other words: we obtain a specific value for [math]\displaystyle{ d_i }[/math], which is deceeded by 99 % of all events but exceeded by just 1 % of all events:

  1. if [math]\displaystyle{ n \cdot p }[/math] integer: [math]\displaystyle{ d_{\rm{Q99}} = 0.5\left( s_{n \cdot p}+s_{n \cdot p+1}\right) }[/math];
  2. if [math]\displaystyle{ n \cdot p }[/math] real: [math]\displaystyle{ d_{\rm{Q99}} = s_{\lceil n \cdot p \rceil} }[/math];
  3. in case that there are less than 32 valid data [math]\displaystyle{ d_{\rm{Q99}} = \rm{invalid} }[/math] is set.

Computation is performed for categories KC and KN. [math]\displaystyle{ d_{\rm{Q99}} }[/math] may be computed for different data set sizes at different locations. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.


back to Pre- and Postprocessing


Overview