Spurious Correlations in Time Series Data
Posted November 2, 2018
on:
FIGURE 1: SEASONAL CYCLE OF THE CLOUD DATA
FIGURE 2: CLOUD AND TEMPERATURE ANOMALIES AND THEIR DETRENDED SERIES
FIGURE 3: MONTHLY TIME SCALE: CORR=-0.5682, DETCORR=0.09554
FIGURE 4: ANNUAL TIME SCALE: SOURCE DATA FOR EACH CALENDAR MONTH
FIGURE 5: ANNUAL TIME SCALE: DATA FOR EACH CALENDAR MONTH DETRENDED
FIGURE 6: ANNUAL TIME SCALE: CORRELATION AND DETRENDED CORRELATION
- It has been proposed in various climate change blogs that a negative correlation between cloud cover and temperature (HadCRUT4 global mean temperature anomalies) explains the observed warming prior to the year 2000 and the hiatus in warming since the year 2000 [LINK] . It is shown that the HadCRUT4 monthly mean temperature anomalies are negatively correlated with monthly mean cloud cover and this relationship is developed into a climate model that explains changes in surface temperature in terms of incident solar radiation, the heat trapping effect of carbon dioxide, and the effect of cloud cover. This model is then validated by its ability to faithfully reproduce the HadCRUT4 global mean temperature anomaly series.
- Monthly time scale: The analysis is carried out at a monthly time scale considered valid because the temperature data are anomalies with the seasonal cycle removed; but no attention is paid to the possibility that the cloud data may contain a seasonal cycle. In Figure 1 above, a plot of the mean cloud cover shows a strong seasonal cycle with low cloud cover in the Northern summer and higher cloud cover in the Northern fall, spring, and winter. This seasonal cycle implies that at a monthly time scale the source cloud data may not be used directly but must be deseasonalized to conform with the deseasonalized temperature anomalies.
- Monthly time scale: Deseasonalized cloud data are computed by subtracting the mean seasonal cycle from the data. These cloud anomalies and the HadCRUT4 temperature anomalies are shown in Figure 2 along with their corresponding detrended series. The correlation between the cloud cover anomalies and the temperature anomalies is found to be r = -0.562, a strong negative correlation in support of the hypothesis derived from these data that warming is driven by low cloud cover.
- Monthly time scale: It is noted, however, that spurious correlations in time series data imposed by long term trends do not imply a responsiveness of Y to changes in X at the time scale of interest. The Tyler Vigen website contains many examples to demonstrate this spurious correlation property of time series data [LINK] . In this case, to detect the responsiveness of temperature to cloud cover at a monthly time scale, net of the effect of long term trends, it is necessary to remove the long term trends from the data with a detrending procedure as explained in this brief lecture by Alex Tolley [LINK] .
- Monthly time scale: The detrended data are shown in Figure 2 (in red) and their correlation is displayed graphically in the second frame of Figure 3. This graphic shows that no correlation at a monthly time scale remains when the effect of long term trends is removed. The strong negative correlation seen in the source data of r=-0.562 turns out to be an artifact of long term trends. At a monthly time scale a statistically insignificant correlation of r=0.09554 remains and the negative sign, essential for the theory that low cloud cover causes warming, is gone. Thus, though the required negative correlation is seen in the source data, detrended analysis shows that it has no implication in terms of responsiveness of temperature to cloud cover at a monthly time scale.
- Annual time scale: The corresponding analysis is presented at an annual time scale for each of the twelve calendar months separately. This analysis option does not require the computation of deseasonalized anomalies because the calendar months are not combined. Instead we test the hypothesis that the monthly mean temperature is responsive to monthly mean cloud cover from one year to the next. The results are presented in Figure 4 to Figure 6.
- Annual time scale: Strong statistically significant negative correlations are seen in the source data for the eleven calendar months from February to December from r=-0.5 to r=-0.7 as seen in the left frame of Figure 6. However, these correlations are artifacts of long term trends and do not represent responsiveness of temperature to cloud cover at an annual time scale as can be seen in the right frame of Figure 6 where all the statistically significant negative correlations have vanished.
- This analysis shows that the spurious correlation in the source data created by long term trends that does not imply responsiveness at an annual or monthly time scale, has been misinterpreted by the authors of the blog posts in terms of an inverse causal relationship between cloud cover and surface temperature. Of course correlation even at the time scale of interest does not imply causation but that condition is more clearly stated as “correlation is a necessary but not sufficient condition for causation”. Here we have addressed the necessity condition and do not claim the sufficient condition.
- All data and computational details used in this work are available for download from an online data archive: [LINK]
8 Responses to "Spurious Correlations in Time Series Data"

[…] Spurious Correlations in Time Series Data […]


[…] proportionality for responsiveness at an annual time scale with detrended correlation analysis [DESCRIBED IN A RELATED POST] . Each panel consists of three frames. The left frame presents the log(CO2) data, the middle frame […]


[…] that derives from shared trends. The motivation for this procedure is described in a related post [LINK] . Briefly, the trend is removed from the data so that only the regression residuals remain and a […]


[…] The correlation between the rate of SLR and emissions at a time scale of 30, 35, 40, 45, and 50 years are depicted graphically in Figure 2, Figure 3, Figure 4, Figure 5, and Figure 6. The left panel of these figures show the correlation in the source data that includes the contribution of shared long term trends. The right panel shows the correlation between the detrended series in which the contribution of shared long term trends is removed and only the responsiveness at the time scale of interest remains. The necessity of this procedure is explained in a related post [LINK] . […]

1 | Table of Contents | Thongchai Thailand
November 3, 2018 at 12:00 am
[…] Spurious Correlations in Time Series Data […]
August 11, 2020 at 4:12 am
Along a similar subjects linking, I bring this incomplete article to your attention https://notrickszone.com/2020/08/09/large-increase-in-number-of-sunshine-hours-likely-behind-warming-glacier-retreat-in-alps-since-1980/?fbclid=IwAR0sYetR9hFeaVOGy513zjIVJJ9lSUFfmQ-K8H3WHfursL2IXCOOFjNZcNI
August 11, 2020 at 4:43 am
Thank you.
August 12, 2020 at 6:52 pm
Thank you Paul. This is a 50-year study at a decadal time scale. The effective sample size is about 5. There can’t be a lot of statistical power in this test. It does not read like unbiased and objective scientific inquiry. Like climate science there is an agenda the difference being that it is a different agenda.