ILLUSORY STATISTICAL POWER IN TIME SERIES ANALYSIS
Posted April 30, 2019
on:FIGURE 1: HURRICANE DATA
FIGURE 2: APRIL TEMPERATURES IN HONG KONG 1955-2016
- Moving average and autoregressive models in time series analysis such as MA, AR, ARMA, ARIMA, and data smoothing methods make use of a moving window with a fixed length of time (Box, 1994) (Chatfield, 1989) (Draper&Smith, 1998) (Mudelsee, 2014) (Granger, 2008). The window moves forward, at increments of one unit of time and the object parameters are computed from the data in the window at each increment. In most cases the moving window procedure creates a preprocessed series which is subjected to further statistical analysis. A popular procedure of this kind is the moving average. In this procedure, the simple or weighted average of the data within the window is computed at each increment of time in the journey of the window from the beginning to the end of the time series. These averages form the filtered series and this series serves as the time series for further statistical analysis perhaps for trends, correlations, regression coefficients or other parameters. This post is an examination of a common error in this procedure.
- The well-known study of trends in North Atlantic Hurricane intensity in the context of climate change by high profile climate scientist Kerry Emanuel of MIT is an example of the use of moving averages in the study of trends (Emanuel, 2005) as shown in a related post [LINK] . Such procedures are used when the researcher feels that the random scatter in the source time series is an impediment to discovering its underlying structure and behavior. The motivation for preprocessing the data prior to trend analysis is to reduce the residual variance of the data around the trend line. As an example of such analysis of time series data, consider the hurricane data in Figure 1 where the random variation of the data at an annual time scale may be an impediment to understanding the underlying pattern of hurricane counts. In this case five-year moving averages are used to discover trend patterns at a five year time scale.
- Some objections have been raised by Professor Watkins and others to the use of preprocessed series for trend analysis of this kind because the filtered time series does not contain much of the uncertainty in the source data time series and no explanation can be given for what appears to be a magical gain in statistical power (Blumel, 2015) (Briggs, 2008) (Watkins, 2006) . In this post we examine this issue from a perspective of degrees of freedom lost when the same data item in the source time series is used multiple times in the preprocessing algorithm. A procedure is proposed for adjusting degrees of freedom to account for multiplicity in the use of the data. The visual indication in Figure 1 is that the filtered series indicated by the red line contains less uncertainty and more information than the source data indicated by the black line; but where did this new information come from? The apparent reduction in uncertainty and the implied gain in information and statistical power is illusory. Our source of information is unchanged and no new information was gathered. It is proposed that the illusion of increased statistical power is created by multiplicity in data usage. When moving windows are used, the first and last data points are used only once but the other data values in the time series are used more than once. Therefore, an adjustment of the effective sample size and degrees of freedom in the filtered time series is necessary to account for multiplicity.
- A moving window of length λ advancing by an increment of one time unit through a time series of length N will generate a total of N-λ+1 windows. Since each window contains λ numbers, a total of λ*(N-λ+1) numbers are used by the moving window. Yet, there are only N numbers in the time series. Therefore, the average multiplicity is M = (λ/N)*(N-λ+1). Each number in the series is used M times on average. The effective value of N (EFFN) is then computed as ξ = N/M. For some procedures a second pass of a moving window is used. If the length of the second window is ϒ then sample size for the second pass is N-λ and the additional number of times that the data are used may be written as of ϒ*(N-λ-ϒ+1). The grand total for both passes is Σ = λ*(N-λ+1) + ϒ*(N-λ-ϒ+1). The equation for multiplicity may be written as M = Σ/N and the effective value of N as ξ = N/M. The degrees of freedom for any given statistic can then be computed as the DF = ξ – K where K is the number of constraints contained in the statistic. Although the number of values generated by the moving window is N-λ for the first pass and N-λ-ϒ for the second pass, the computation of multiplicity requires the full length N of the source data series from which the moving window series was derived.
- For example in a time series of 70 years if we generate a moving average series with λ=5 as in Figure 1, N=70 and N-λ=65. The average multiplicity is M = (5/70)*(70-5+1) = 4.714286. The effective value of the sample size is computed as ξ = 70/4.714286 ≈ 14.84848. Note that the computed value of the effective sample size may be approximated by ξ = N/λ = 70/5 = 14. If a second pass is made with ϒ=5, the multiplicity increases. The number of values used by both moving windows is Σ = 5*(70-5+1) + 5*(70-5-5+1) = 635. Multiplicity is therefore M = 635/70 = 9.07. The effective sample size is ξ = 70/9.07= 7.72. Note that the computed value of the effective sample size may be approximated by ξ = N/(λ+ϒ) = 70/10 = 7.
- Figure 2 is a presentation of how the effective value of N and the reduction in degrees of freedom can change the conclusions of statistical analysis of preprocessed time series data. Here we find that April temperatures in Hong Kong 1955-2016 show a warming trend and that the rate of warming appears to be higher in the preprocessed series than in the source data. At the same time the preprocessed series show less random scatter and therefore increasingly greater statistical power (R-squared = 0.041 in the source data, 0.233 in the 5-year moving averages (MA5), and 0.370 in the five year moving averages of the five year moving averages (MA5,5). In the hypothesis test for H0: β=0 without correcting for multiplicity we find that the probability of observing these sample results (or more extreme) in the H0 distribution is P-VALUE = 0.1145970 in the source data series and P-VALUE= 0.0000713, and 0.0000002 in the preprocessed series. At α=0.001 we fail to reject H0 in the source data but we are able to reject H0 in both the filtered series. This result appears to show greater statistical power in the filtered series than in the source data series.
- To determine whether this gain in statistical power is illusory or real we correct for multiplicity in the preprocessed series and compute ADJUSTED DEGREES OF FREEDOM = 12.848 for MA(5) and 5.717 for MA(5,5). The corresponding ADJUSTED P-VALUEs are 0.0010927 for MA(5) and 0.0019396 for MA(5,5). At α=0.001 we fail to reject H0. This result implies that the apparent statistical power observed in the filtered series without adjustment for multiplicity is illusory and an artifact of multiplicity. In this case, the filtered series appears to contain more information than the source series but not to the extent implied without the correction for effective sample size (effN).
- A well known example of climate science research that failed to take these considerations into account is the Emanuel 2005 paper where high profile MIT climate scientist Kerry Emanuel concluded erroneously that his data proved that climate change was causing North Atlantic Hurricanes to become more destructive. This faux finding has encouraged decades of activism against fossil fuels fueled by fear of destructive hurricanes. The Emanuel 2005 paper is discussed in depth in a related post [LINK] .
- CONCLUSION: All moving window processes in time series analysis involve repeated use of the same data value. If the same data value is used multiple times, it creates a false sense of information because this piece of data brings with it new information only in the first use. It is therefore proposed that the information content of a filtered series and therefore its degrees of freedom must be adjusted for multiplicity. A procedure is presented for estimating the average multiplicity in the use of the source data series in generating the filtered series. The average multiplicity is used to estimate an effective sample size and the effective degrees of freedom. Hypothesis tests must be checked to ensure that rejection of H0 survives when the degrees of freedom are adjusted for multiplicity.
A SIGNIFICANT IMPLICATION OF THESE RELATIONSHIPS FOR CLIMATE SCIENCE IS THAT A TIME SERIES OF THE CUMULATIVE VALUES OF ANOTHER TIME SERIES, AS IN THE TCRE, HAS NEITHER DEGREES OF FREEDOM NOR TIME SCALE. THERFORE THE TCRE DOES NOT CONTAIN USEFUL INFORMATION.
- Blumel, K. (2015). Does climate change affect period available field time and required capacities for grain harvesting in Brandenburg Germany/reviews/72069. Retrieved 2016, from Researchgate: https://www.researchgate.net/publication/270793652_Does_clima
- Bowley, A. (1928). The standard deviation of the correlation coefficient. Journal of the American Statistical Association, 31-34.
- Box, G. (1994). Time series analysis: forecasting and control. Englewood Cliffs, NJ: Prentice Hall.
- Briggs, W. (2008). Do not smooth time series. Retrieved 2016, from wmbriggs.com: http://wmbriggs.com/post/195/
- Chatfield, C. (1989). The Analysis of Time Series: An Introduction. NY: Chapman and Hall/CRC.
- Emanuel, K. (2005). Increasing destructiveness of tropical cyclones over the past 30 years. Nature, 436.7051 (2005): 686-688.
- Granger, C. (2008). ACRONYMS IN TIME SERIES ANALYSIS (ATSA). Journal of Time Series Analysis, 3(2):103 – 107 · June 2008.
- Hong Kong Observatory. (2016). Climatology. Retrieved 2016, from Hong Kong Observatory: http://www.hko.gov.hk/cis/climat_e.htm
- Johnson, V. (2013). Revised standards for statistical evidence. Retrieved 2015, from Proceedings of the National Academy of Sciences: http://www.pnas.org/content/110/48/19313.full
- Mudelsee, M. (2014). Climate Time Series Analysis: Classical Statistical and Bootstrap Methods. Springer.
- Munshi, J. (2015). Decadal Fossil Fuel Emissions and Decadal Warming. Retrieved 2016, from ssrn.com/author=2220942: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2662870
- Munshi, J. (2016). Illusory-power data archive. Retrieved 2016, from Google Drive: https://drive.google.com/open?id=0ByzA6UNa41ZfbXpuOHozTm5sems
- Siegfried, T. (2010). Odds Are, It’s Wrong. Retrieved 2016, from Science News: https://www.sciencenews.org/article/odds-are-its-wrong
- Watkins, T. (2006). How the Use of Moving Averages Can Create the Appearance of Confirmation of Theories. Retrieved 2016, from Thayer Watkins SJSU: http://www.sjsu.edu/faculty/watkins/movingaveraging.htm
1 | Table of Contents | Thongchai Thailand
May 13, 2019 at 7:32 pm
[…] ILLUSORY STATISTICAL POWER IN TIME SERIES ANALYSIS […]