Thongchai Thailand

Spurious Correlations in Time Series Data

Posted on: November 2, 2018

 

 

FIGURE 1: SEASONAL CYCLE OF THE CLOUD DATACLOUD-SEASONAL-CYCLE

 

FIGURE 2: CLOUD AND TEMPERATURE ANOMALIES AND THEIR DETRENDED SERIESMONTHLY-ANOMALIES&DETRENDED

 

FIGURE 3: MONTHLY TIME SCALE: CORR=-0.5682DETCORR=0.09554MONTHLY-CORR-DETCORR

 

FIGURE 4: ANNUAL TIME SCALE: SOURCE DATA FOR EACH CALENDAR MONTHANNUAL-SOURCE

 

FIGURE 5: ANNUAL TIME SCALE: DATA FOR EACH CALENDAR MONTH DETRENDEDANNUAL-DET

 

FIGURE 6: ANNUAL TIME SCALE: CORRELATION AND DETRENDED CORRELATIONCORR-DETCORR

 

  1. It has been proposed in various climate change blogs that a negative correlation between cloud cover and temperature (HadCRUT4 global mean temperature anomalies) explains the observed warming prior to the year 2000 and the hiatus in warming since the year 2000 [LINK] . It is shown that the HadCRUT4 monthly mean temperature anomalies are negatively correlated with monthly mean cloud cover and this relationship is developed into a climate model that explains changes in surface temperature in terms of incident solar radiation, the heat trapping effect of carbon dioxide, and the effect of cloud cover. This model is then validated by its ability to faithfully reproduce the HadCRUT4 global mean temperature anomaly series.
  2. Monthly time scale: The analysis is carried out at a monthly time scale considered valid because the temperature data are anomalies with the seasonal cycle removed; but no attention is paid to the possibility that the cloud data may contain a seasonal cycle. In Figure 1 above, a plot of the mean cloud cover shows a strong seasonal cycle with low cloud cover in the Northern summer and higher cloud cover in the Northern fall, spring, and winter. This seasonal cycle implies that at a monthly time scale the source cloud data may not be used directly but must be deseasonalized to conform with the deseasonalized temperature anomalies.
  3. Monthly time scale: Deseasonalized cloud data are computed by subtracting the mean seasonal cycle from the data. These cloud anomalies and the HadCRUT4 temperature anomalies are shown in Figure 2 along with their corresponding detrended series. The correlation between the cloud cover anomalies and the temperature anomalies is found to be r = -0.562, a strong negative correlation in support of the hypothesis derived from these data that warming is driven by low cloud cover.
  4. Monthly time scale: It is noted, however, that spurious correlations in time series data imposed by long term trends do not imply a responsiveness of Y to changes in X at the time scale of interest. The Tyler Vigen website contains many examples to demonstrate this spurious correlation property of time series data [LINK] . In this case, to detect the responsiveness of temperature to cloud cover at a monthly time scale, net of the effect of long term trends, it is necessary to remove the long term trends from the data with a detrending procedure as explained in this brief lecture by Alex Tolley [LINK] .
  5. Monthly time scale: The detrended data are shown in Figure 2 (in red) and their correlation is displayed graphically in the second frame of Figure 3. This graphic shows that no correlation at a monthly time scale remains when the effect of long term trends is removed. The strong negative correlation seen in the source data of  r=-0.562 turns out to be an artifact of long term trends. At a monthly time scale a statistically insignificant correlation of r=0.09554 remains and the negative sign, essential for the theory that low cloud cover causes warming, is gone. Thus, though the required negative correlation is seen in the source data, detrended analysis shows that it has no implication in terms of responsiveness of temperature to cloud cover at a monthly time scale.
  6. Annual time scale: The corresponding analysis is presented at an annual time scale for each of the twelve calendar months separately. This analysis option does not require the computation of deseasonalized anomalies because the calendar months are not combined. Instead we test the hypothesis that the monthly mean temperature is responsive to monthly mean cloud cover from one year to the next. The results are presented in Figure 4 to Figure 6.
  7. Annual time scale: Strong statistically significant negative correlations are seen in the source data for the eleven calendar months from February to December from r=-0.5 to r=-0.7 as seen in the left frame of Figure 6. However, these correlations are artifacts of long term trends and do not represent responsiveness of temperature to cloud cover at an annual time scale as can be seen in the right frame of Figure 6 where all the statistically significant negative correlations have vanished.
  8. This analysis shows that the spurious correlation in the source data created by long term trends that does not imply responsiveness at an annual or monthly time scale, has been misinterpreted by the authors of the blog posts in terms of an inverse causal relationship between cloud cover and surface temperature. Of course correlation even at the time scale of interest does not imply causation but that condition is more clearly stated as “correlation is a necessary but not sufficient condition for causation”. Here we have addressed the necessity condition and do not claim the sufficient condition.
  9. All data and computational details used in this work are available for download from an online data archive:  [LINK]

 

[LIST OF POSTS AT THIS SITE]

 

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: