Differences of Calculated Results
From BAWiki
Introduction
For data generated by
- mathematical models (model results), or
- analysis of calculated results (characteristic numbers), or
- measured data (observational data)
Various differences can be computed. Input data can be typically categorized as follows:
- Category K0: [math]\displaystyle{ f(x,y,z) }[/math], time-independent quantities;
- Category K1: [math]\displaystyle{ f(x,y,z,t_1) }[/math], time-dependent quantities, one time step;
- Category KC: [math]\displaystyle{ f(x,y,z,t_i) }[/math], time-dependent quantities, several discrete time steps, constant time step [math]\displaystyle{ \Delta_t }[/math];
- Category KN: [math]\displaystyle{ f(x,y,z,t_i) }[/math], time-dependent quantities, several discrete time steps, varying time step [math]\displaystyle{ \Delta_t(i) }[/math].
For geophysical data categories K1, KC and KN are of significance. Examples:
- Category K1: topography/bathymetry [math]\displaystyle{ h(x,y,z,t_1) }[/math] for a specific instant in time;
- Category KC: water level [math]\displaystyle{ \eta(x,y,z,t_i) }[/math] at discrete times [math]\displaystyle{ t_i }[/math] with constant time step, e. g. computed by a mathematical model;
- Category KN: tidal high water [math]\displaystyle{ \eta^{\rm{HW}}(x,y,z,t_i) }[/math] for times [math]\displaystyle{ t_i }[/math] at non-equidistant time intervals, e.g. derived from a water level time serie.
Definitions
- reference data [math]\displaystyle{ r }[/math]: with respect to [math]\displaystyle{ r }[/math] various deviations for [math]\displaystyle{ f }[/math] can be evaluated. Typical data are either observational data or computational as well as analysis results for a specific (reference) state (situation);
- variant data [math]\displaystyle{ f }[/math]: can be also either observational data or computational as well as analysis results, for which deviations shall be computed with respect to the reference state. Typically variant data are given for a different period in time (natural variation) or a different state of the system under study.
- valid operator 1: [math]\displaystyle{ V(r_i) }[/math] returns .T. or .F., in dependence whether [math]\displaystyle{ r_i }[/math] is valid or invalid. Can be also applied to [math]\displaystyle{ f_i }[/math].
- valid operator 2: [math]\displaystyle{ V(r_I,f_i) }[/math] returns .T. or .F., in dependence whether [math]\displaystyle{ V(r_i)\land V(f_i) }[/math] is valid or invalid.
- integer operator 1: [math]\displaystyle{ P(r_i) }[/math] returns 1 if [math]\displaystyle{ V(r_i) }[/math] else 0. Similar for [math]\displaystyle{ f_i }[/math].
- integer operator 2: [math]\displaystyle{ P(r_i,f_i) }[/math] returns 1 if [math]\displaystyle{ V(r_i)\land V(f_i) }[/math] .T. else 0.
Requirements for the computation of differences
The following requirements must be fulfilled by [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math]:
- [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math] must belong to the same category (see above);
- the number of times [math]\displaystyle{ t_i }[/math] must be identical for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math];
- for data belonging to category KC constant time steps must coincide [math]\displaystyle{ \Delta t }[/math] for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math];
- (physical) dimension as well as meaning must be equivalent for [math]\displaystyle{ r }[/math] and [math]\displaystyle{ f }[/math];
- [math]\displaystyle{ r_i }[/math] (short for [math]\displaystyle{ r(x,y,z,t_i) }[/math]) as well as [math]\displaystyle{ f_i }[/math] (short for [math]\displaystyle{ r(x,y,z,t_i) }[/math]) must be valid data for the same instant [math]\displaystyle{ i }[/math] in time; otherwise the dervied results will become invalid.
Computational results
Program NCDELTA can be used to compute all subsequent results. Locations of [math]\displaystyle{ r }[/math] are not required to coincide with those of [math]\displaystyle{ f }[/math]. Values [math]\displaystyle{ r }[/math] are interpolated to locations of [math]\displaystyle{ f }[/math], as long as the geographical distance between the different locations does not exceed [math]\displaystyle{ R^\max }[/math]. In case the distance exceeds that limit, no results will be computed. In such a situation an invalid result value will be generated. The follwing results can be computed using NCDELTA.
Ordinary differences
Difference
A result is computed for all times (one value for time-independent data) at all locations [math]\displaystyle{ (x,y,z) }[/math]:
- The difference between [math]\displaystyle{ f_i }[/math] and [math]\displaystyle{ r_i }[/math] is calculated in case [math]\displaystyle{ V(r_i,f_i) }[/math] returns .T.:
- [math]\displaystyle{ d_i = f_i - r_i }[/math], if [math]\displaystyle{ V(r_i,f_i) }[/math];
- Result will be invalid, if [math]\displaystyle{ V(r_i,f_i) }[/math] returns .F.:
- [math]\displaystyle{ d_i = \rm{invalid} }[/math] if [math]\displaystyle{ \lnot V(r_i,f_i) }[/math].
Results are computed for data belonging to categories K0, K1, KC und KN, which means for all types of data.
Maximum difference
Maximum difference is determined using absolute value (modulus) in combination with sign preservation:
- At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
- Out of all valid data index [math]\displaystyle{ i^\max }[/math] is determined in such a way that [math]\displaystyle{ \left|d_i\right| }[/math] is maximal
- [math]\displaystyle{ d^\max = d_{i^\max} }[/math]
- is equal to the maximum difference according to this definition; this value can be negative, positive or zero;
- In case all values [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^\max = \rm{invalid} }[/math] will be set.
Computation is performed for categories KC and KN. A valid [math]\displaystyle{ d^\max }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Minimum difference
Minimum difference is determined using absolute value (modulus) in combination with sign preservation:
- At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
- Out of all valid data index [math]\displaystyle{ i^\min }[/math] is determined in such a way that [math]\displaystyle{ \left|d_i\right| }[/math] is minimal
- [math]\displaystyle{ d^\min = d_{i^\min} }[/math]
- is equal to the minimum difference according to this definition; this value can be negative, positive or zero;
- In case all values [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^\min = \rm{invalid} }[/math] will be set.
Computation is performed for categories KC and KN. A valid [math]\displaystyle{ d^\min }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Mean difference
Mean value is computed for all valid differences:
- At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
- From all valid differences the mean value is computed as
- [math]\displaystyle{ d^{\rm{mit}}=\frac{\sum_{i\in I}P(d_i)d_i}{\sum_{i\in I}P(d_i)} }[/math];
- In case all [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^{\rm{mit}} = \rm{invalid} }[/math] will be set.
Computation is performed for categories KC und KN. A valid [math]\displaystyle{ d^{\rm{mit}} }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Mean deviation
Mean deviation is computed for all valid differences:
- At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
- From all valid differences the mean deviation is computed as
- [math]\displaystyle{ d^{\rm{abw}}=\frac{\sum_{i\in I}P(d_i)\left|d_i\right|}{\sum_{i\in I}P(d_i)} }[/math];
- In case all [math]\displaystyle{ d_i }[/math] are invalid [math]\displaystyle{ d^{\rm{abw}} = \rm{invalid} }[/math] will be set.
Computation is performed for categories KC und KN. A valid [math]\displaystyle{ d^{\rm{abw}} }[/math] is obtained as long there exists at least one valid difference [math]\displaystyle{ d_i }[/math]. During visualization NCPLOT enables filtering using ancillary variable Number of valid differences.
Number of valid differences
The number of valid differences may vary between different locations. As a consequence the above mentioned quantities may be computed from data sets of different size:
- At first all differences [math]\displaystyle{ d_i }[/math] will be computed as indicated above;
- The number of valid differences is computed as
- [math]\displaystyle{ N_{\rm{ord}}=\sum_{i\in I}P(d_i) }[/math];
Computation is performed for categories KC und KN. In program NCPLOT quantities like maximum difference, minimum difference, mean value as well as mean deviation can be visualised using this variable as a filter. In that way visualizations can be created where the results are shown for points only where e. g. either the maximum number of events or a specific number of events occurred.
Data for a Taylor diagram
Taylor diagrams provide "a concise statistical summary of how well patterns match each other in terms of their correlation, their root-mean-square difference and the ratio of their variances." Additional information such as bias can be added to the conventional Taylor diagram. The Taylor diagram provides a graphical framework that allows a suite of variables from a variety of (say) one or more models or reanalyses to be compared to reference data. The reference data can be observationally based (eg, reanalysis) or to another model or a control run.
Literature and further informations:
- Taylor, K. E. (2001), Summarizing multiple aspects of model performance in a single diagram, Journal of Geophysical Research, 106 (D7), 7183–7192, doi: http://dx.doi.org/10.1029/2000JD900719;
- http://www-pcmdi.llnl.gov/about/staff/Taylor/CV/Taylor_diagram_primer.htm with a short introduction by K. E. Taylor as well as links to example applications.
Standard deviation for reference data
Standard deviation is going to be computed for all valid reference data:
- Mean value for [math]\displaystyle{ r_i }[/math] is computed according to
- [math]\displaystyle{ \bar{r}=\frac{\sum_{i\in I}P(r_i,f_i)r_i}{\sum_{i\in I}P(r_i,f_i)} }[/math];
- If a valid [math]\displaystyle{ \bar{r} }[/math] was computed standard deviation is obtained from
- [math]\displaystyle{ \sigma_r = \sqrt{\frac{\sum_{i\in I}P(r_i,f_i)\left(r_i-\bar{r}\right)^2}{\sum_{i\in I}P(r_i,f_i)}} }[/math];
- In case all [math]\displaystyle{ P(r_i,f_i) }[/math] are 0 [math]\displaystyle{ \sigma_r = \rm{invalid} }[/math] is set.
Computation is performed for categories KC and KN. A valid [math]\displaystyle{ \sigma_r }[/math] is obtained as long as there exists at least one valid difference [math]\displaystyle{ P(r_i,f_i) }[/math]. Standard deviations for different locations may be computed from data sets of different size. During visualization NCPLOT enables filtering using ancillary variable Number of valid Taylor data.
Remark: We apply [math]\displaystyle{ P(r_i,f_i) }[/math] instead of [math]\displaystyle{ P(r_i) }[/math]. This guarantees that all Taylor data are computed for the same data set size at a specific location.
Median
Percentiles
back to Pre- and Postprocessing