Cross-validation is essentially the ability to predict the characteristics of an unexplored region based on a model of an explored region. The explored region is often used as a training interval to test or validate model applicability on the unexplored interval. If some fraction of the expected characteristics appears in the unexplored region when the model is extrapolated to that interval, some degree of validation is granted to the model.
This is a powerful technique on its own as it is used frequently (and depended on) in machine learning models to eliminate poorly performing trials. But it gains even more importance when new data for validation will take years to collect. In particular, consider the arduous process of collecting fresh data for El Nino Southern Oscillation, which will take decades to generate sufficient statistical significance for validation.
So, what’s necessary in the short term is substantiation of a model’s potential validity. Nothing else will work as a substitute, as controlled experiments are not possible for domains as large as the Earth’s climate. Cross-validation remains the best bet.
In an earlier post, the observation was that ENSO models may not be unique due to the numerous possibilities provided by nonlinear math. This was supported by the fact that a tidal forcing model based on the Mf (13.66 day) tidal factor worked equally as well as a Mm (27.55 day) factor. This was not surprising considering that the aliasing against an annual impulse gave a similar repeat cycle — 3.8 years versus 3.9 years. But I have also observed that mixing the two in a linear fashion did not improve the fit much at all, as the difference created a long interference cycle which isn’t observed in the ENSO time series data. But then thinking in terms of the nonlinear modulation required, it may be that the two factors can be combined after the LTE solution is applied.
The forcing spectrum like this, with the aliased draconic (27.212d) factor circled:
For QBO, we remove all the lunar factors except for the draconic, as this is the only declination factor with the same spherical group symmetry as the semi-annual solar declination.
And after modifying the annual (ENSO spring-barrier) impulse into a semi-annual impulse with equal and opposite excursions, the resultant model matches well (to first order) the QBO time series.
Although the alignment isn’t perfect, there are indications in the structure that the fit has a deeper significance. For example, note how many of the shoulders in the structure align, as highlighted below in yellow
The peaks and valleys do wander about a bit and might be a result of the sensitivity to the semi-annual impulse and the fact that this is only a monthly resolution. The chart below is a detailed fit of the QBO using data with a much finer daily resolution. As you can see, slight changes in the seasonal timing of the semi-annual pulse are needed to individually align the 70 and 30 hBar QBO time-series data.
The underlying forcing of the ENSO model shows both an 18-year Saros cycle (which is an eclipse alignment cycle of all the tidal periods), along with a 6-year anomalistic/draconic interference cycle. This modulation of the main anomalistic cycle appears in both the underlying daily and monthly profile, shown below before applying an annual impulse. The 6-year is clearly evident as it aligns with the x-axis grid 1880, 1886, 1892, 1898, etc.
The 6-year cycle in the LOD is not aligned as strictly as the tidal model and it tends to wander, but it seems a more plausible and parsimonious explanation of the modulation than for example in this paper (where the 6-year LOD cycle is “similarly detected in the variations of C22 and S22, the degree-2 order-2 Stokes coefficients of the Earth’s gravitational field”).
Cross-validation confidence improves as the number of mutually agreeing alignments increase. Given the fact that controlled experiments are impossible to perform, this category of analyses is the best way to validate the geophysical models.
In our book Mathematical GeoEnergy, several geophysical processes are modeled — from conventional tides to ENSO. Each model fits the data applying a concise physics-derived algorithm — the key being the algorithm’s conciseness but not necessarily subjective intuitiveness.
I’ve followed Gell-Mann’s work on complexity over the years and so will try applying his qualitative effective complexity approach to characterize the simplicity of the geophysics models described in the book and on this blog.
Here’s a breakdown from least complex to most complex
In Chapter 12 of the book, we provide an empirical gravitational forcing term that can be applied to the Laplace’s Tidal Equation (LTE) solution for modeling ENSO. The inverse squared law is modified to a cubic law to take into account the differential pull from opposite sides of the earth.
The two main terms are the monthly anomalistic (Mm) cycle and the fortnightly tropical/draconic pair (Mf, Mf’ w/ a 18.6 year nodal modulation). Due to the inverse cube gravitational pull found in the denominator of F(t), faster harmonic periods are also created — with the 9-day (Mt) created from the monthly/fortnightly cross-term and the weekly (Mq) from the fortnightly crossed against itself. It’s amazing how few terms are needed to create a canonical fit to a tidally-forced ENSO model.
The recipe for the model is shown in the chart below (click to magnify), following sequentially steps (A) through (G) :
The tidal forcing is constrained by the known effects of the lunisolar gravitational torque on the earth’s length-of-day (LOD) variations. An essentially identical set of monthly, fortnightly, 9-day, and weekly terms are required for both a solid-body LOD model fit and a fluid-volume ENSO model fit.
If we apply the same tidal terms as forcing for matching dLOD data, we can use the fit below as a perturbed ENSO tidal forcing. Not a lot of difference here — the weekly harmonics are higher in magnitude.
So the only real unknown in this process is guessing the LTE modulation of steps (F) and (G). That’s what differentiates the inertial response of a spinning solid such as the earth’s core and mantle from the response of a rotating liquid volume such as the equatorial Pacific ocean. The former is essentially linear, but the latter is non-linear, making it an infinitely harder problem to solve — as there are infinitely many non-linear transformations one can choose to apply. The only reason that I stumbled across this particular LTE modulation is that it comes directly from a clever solution of Laplace’s tidal equations.
The red data points are the spectral values used in the ENSO model fit.
The top panel below is the LTE modulated tidal forcing fitted against the ENSO time series. The lower panel below is the tidal forcing model over a short interval overlaid on the dLOD/dt data.
That’s all there is to it — it’s all geophysical fluid dynamics. Essentially the same tidal forcing impacts both the rotating solid earth and the equatorial ocean, but the ocean shows a lagged nonlinear response as described in Chapter 12 of the book. In contrast, the solid earth shows an apparently direct linear inertial response. Bottom line is that if one doesn’t know how to do the proper GFD, one will never be able to fit ENSO to a known forcing.
In Chapter 12 of the book, we focused on modeling the standing-wave behavior of the Pacific ocean dipole referred to as ENSO (El Nino /Southern Oscillation). Because it has been in climate news recently, it makes sense to give equal time to the Atlantic ocean equivalent to ENSO referred to as the Atlantic Multidecadal Oscillation (AMO). The original rationale for modeling AMO was to determine if it would help cross-validate the LTE theory for equatorial climate dipoles such as ENSO; this was reported at the 2018 Fall Meeting of the AGU (poster). The approach was similar to that applied for other dipoles such as the IOD (which is also in the news recently with respect to Australia bush fires and in how multiple dipoles can amplify climate extremes ) — and so if we can apply an identical forcing for AMO as for ENSO then we can further cross-validate the LTE model. So by reusing that same forcing for an independent climate index such as AMO, we essentially remove a large number of degrees of freedom from the model and thus defend against claims of over-fitting.