Thongchai Thailand

Spurious Correlations in Time Series Data

Posted on: November 2, 2018















  1. It has been proposed in various climate change blogs that a negative correlation between cloud cover and temperature (HadCRUT4 global mean temperature anomalies) explains the observed warming prior to the year 2000 and the hiatus in warming since the year 2000 [LINK] . It is shown that the HadCRUT4 monthly mean temperature anomalies are negatively correlated with monthly mean cloud cover and this relationship is developed into a climate model that explains changes in surface temperature in terms of incident solar radiation, the heat trapping effect of carbon dioxide, and the effect of cloud cover. This model is then validated by its ability to faithfully reproduce the HadCRUT4 global mean temperature anomaly series.
  2. Monthly time scale: The analysis is carried out at a monthly time scale considered valid because the temperature data are anomalies with the seasonal cycle removed; but no attention is paid to the possibility that the cloud data may contain a seasonal cycle. In Figure 1 above, a plot of the mean cloud cover shows a strong seasonal cycle with low cloud cover in the Northern summer and higher cloud cover in the Northern fall, spring, and winter. This seasonal cycle implies that at a monthly time scale the source cloud data may not be used directly but must be deseasonalized to conform with the deseasonalized temperature anomalies.
  3. Monthly time scale: Deseasonalized cloud data are computed by subtracting the mean seasonal cycle from the data. These cloud anomalies and the HadCRUT4 temperature anomalies are shown in Figure 2 along with their corresponding detrended series. The correlation between the cloud cover anomalies and the temperature anomalies is found to be r = -0.562, a strong negative correlation in support of the hypothesis derived from these data that warming is driven by low cloud cover.
  4. Monthly time scale: It is noted, however, that spurious correlations in time series data imposed by long term trends do not imply a responsiveness of Y to changes in X at the time scale of interest. The Tyler Vigen website contains many examples to demonstrate this spurious correlation property of time series data [LINK] . In this case, to detect the responsiveness of temperature to cloud cover at a monthly time scale, net of the effect of long term trends, it is necessary to remove the long term trends from the data with a detrending procedure as explained in this brief lecture by Alex Tolley [LINK] .
  5. Monthly time scale: The detrended data are shown in Figure 2 (in red) and their correlation is displayed graphically in the second frame of Figure 3. This graphic shows that no correlation at a monthly time scale remains when the effect of long term trends is removed. The strong negative correlation seen in the source data of  r=-0.562 turns out to be an artifact of long term trends. At a monthly time scale a statistically insignificant correlation of r=0.09554 remains and the negative sign, essential for the theory that low cloud cover causes warming, is gone. Thus, though the required negative correlation is seen in the source data, detrended analysis shows that it has no implication in terms of responsiveness of temperature to cloud cover at a monthly time scale.
  6. Annual time scale: The corresponding analysis is presented at an annual time scale for each of the twelve calendar months separately. This analysis option does not require the computation of deseasonalized anomalies because the calendar months are not combined. Instead we test the hypothesis that the monthly mean temperature is responsive to monthly mean cloud cover from one year to the next. The results are presented in Figure 4 to Figure 6.
  7. Annual time scale: Strong statistically significant negative correlations are seen in the source data for the eleven calendar months from February to December from r=-0.5 to r=-0.7 as seen in the left frame of Figure 6. However, these correlations are artifacts of long term trends and do not represent responsiveness of temperature to cloud cover at an annual time scale as can be seen in the right frame of Figure 6 where all the statistically significant negative correlations have vanished.
  8. This analysis shows that the spurious correlation in the source data created by long term trends that does not imply responsiveness at an annual or monthly time scale, has been misinterpreted by the authors of the blog posts in terms of an inverse causal relationship between cloud cover and surface temperature. Of course correlation even at the time scale of interest does not imply causation but that condition is more clearly stated as “correlation is a necessary but not sufficient condition for causation”. Here we have addressed the necessity condition and do not claim the sufficient condition.
  9. All data and computational details used in this work are available for download from an online data archive:  [LINK]









8 Responses to "Spurious Correlations in Time Series Data"

[…] Spurious Correlations in Time Series Data […]

Along a similar subjects linking, I bring this incomplete article to your attention

Thank you Paul. This is a 50-year study at a decadal time scale. The effective sample size is about 5. There can’t be a lot of statistical power in this test. It does not read like unbiased and objective scientific inquiry. Like climate science there is an agenda the difference being that it is a different agenda.

[…] Spurious Correlations in Time Series Data […]

[…] proportionality for responsiveness at an annual time scale with detrended correlation analysis [DESCRIBED IN A RELATED POST] . Each panel consists of three frames. The left frame presents the log(CO2) data, the middle frame […]

[…] that derives from shared trends. The motivation for this procedure is described in a related post [LINK] . Briefly, the trend is removed from the data so that only the regression residuals remain and a […]

[…] The correlation between the rate of SLR and emissions at a time scale of 30, 35, 40, 45, and 50 years are depicted graphically in Figure 2, Figure 3, Figure 4, Figure 5, and Figure 6. The left panel of these figures show the correlation in the source data that includes the contribution of shared long term trends. The right panel shows the correlation between the detrended series in which the contribution of shared long term trends is removed and only the responsiveness at the time scale of interest remains. The necessity of this procedure is explained in a related post [LINK] . […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

  • chaamjamal: Thank you
  • skeptic16: The environmentalist Left and their wealthy financial supporters are not so keen on returning manufacturing to the US where production would be cleane
  • fgsjr2015: Greta Thunberg aptly and poignantly described the global-warming (non)efforts of faux or neo-environmentalist politicos as just more "blah, blah, blah
%d bloggers like this: