With respect to the ENSO model I have been thinking about ways of evaluating the statistical significance of the fit to the data. If we train on one 70 year interval and then test on the following 70 year interval, we get the interesting effect of finding a higher correlation coefficient on the test interval. The training interval is just below 0.85 while the test is above 0.86.
Add an extra couple of terms to improve the fit and the test is still higher — 0.87 on training and 0.88 on the test.
The model fit is relatively aggressive in the number of degrees of freedom (DOF) it contains, since there appear to be multiple forcings involved, each with a unique period. This exacts a statistical price as the number of DOF allows one to also fit the ENSO proxy data to arbitrary models. For example, a red noise random walk synthetic data set can also give an often impressive correlation coefficient by using the same set of parameters, but with varying amplitude and phase. One can see this with the recent post on evaluated ENSO proxy data, as the model can fit red noise reasonably well.
That appears troubling in terms of discriminating between a real and a coincidental fit, but if we look closely at the result of out-of-band tests on trained fits to red noise models, they rapidly become uncorrelated. Below are the statistics for the “in” training run and the “out” test or validation. Even though a correlation coefficient above 0.7 is achieved during training, that holds little significance within the test interval, as all phase coherence disappears when the random walk is invoked. Note below that the out statistics are centered over a coefficient of 0.0, which is essentially the extreme of uncorrelated behavior. But for model operating on the real data, one can see that both the training and test correlation coefficient values are very high (arrows on the right), which means that the ENSO behavior is not stochastic and that whatever periodic behavior is defined in the first 70 years is also observed in the next 70 years.
For this model operating on the real data, at least some of the DiffEq fit is attributed to a forced alignment with a biennial term. This gives probably a 10% improvement of the CC (both for red noise and for the real data) over it not being applied. The expected improvement is due to a common mode multiplicative factor to both the LHS and RHS of the DiffEq. Yet, the biennial factor is essential to providing a mechanism for phase inversions, such as what occurs between 1980 and 1996. And the alignment also objectively improves the fit as even slight variations away from 2 years will appreciably reduce the correlation.