Cross-validation

Cross-validation is essentially the ability to predict the characteristics of an unexplored region based on a model of an explored region. The explored region is often used as a training interval to test or validate model applicability on the unexplored interval. If some fraction of the expected characteristics appears in the unexplored region when the model is extrapolated to that interval, some degree of validation is granted to the model.

This is a powerful technique on its own as it is used frequently (and depended on) in machine learning models to eliminate poorly performing trials. But it gains even more importance when new data for validation will take years to collect. In particular, consider the arduous process of collecting fresh data for El Nino Southern Oscillation, which will take decades to generate sufficient statistical significance for validation.

So, what’s necessary in the short term is substantiation of a model’s potential validity. Nothing else will work as a substitute, as controlled experiments are not possible for domains as large as the Earth’s climate. Cross-validation remains the best bet.

Continue reading

Cross-Validation of Geophysics Behaviors

The fit to the ENSO model looks like this

(click on any image to expand)

The forcing spectrum like this, with the aliased draconic (27.212d) factor circled:

For QBO, we remove all the lunar factors except for the draconic, as this is the only declination factor with the same spherical group symmetry as the semi-annual solar declination.

And after modifying the annual (ENSO spring-barrier) impulse into a semi-annual impulse with equal and opposite excursions, the resultant model matches well (to first order) the QBO time series.

Although the alignment isn’t perfect, there are indications in the structure that the fit has a deeper significance. For example, note how many of the shoulders in the structure align, as highlighted below in yellow

The peaks and valleys do wander about a bit and might be a result of the sensitivity to the semi-annual impulse and the fact that this is only a monthly resolution. The chart below is a detailed fit of the QBO using data with a much finer daily resolution. As you can see, slight changes in the seasonal timing of the semi-annual pulse are needed to individually align the 70 and 30 hBar QBO time-series data.

This will require further work, especially in considering recently reported perturbations in the QBO periodicity, but it is telling that a shared draconic forcing of the ENSO and QBO models suggests an important cross-validation of the underlying causal mechanism.

Detailed analysis also shows LTE modulation

Another potential geophysical cross-validation …

The underlying forcing of the ENSO model shows both an 18-year Saros cycle (which is an eclipse alignment cycle of all the tidal periods), along with a 6-year anomalistic/draconic interference cycle. This modulation of the main anomalistic cycle appears in both the underlying daily and monthly profile, shown below before applying an annual impulse. The 6-year is clearly evident as it aligns with the x-axis grid 1880, 1886, 1892, 1898, etc.

Daily profile above, monthly next, both reveal Saros cycle

The bottom inset shows that a similar 6-year cycle consistently appears in length-of-day (LOD) analyses, this particular trace from a recent paper: [ Leonid Zotov et al 2020 J. Phys.: Conf. Ser. 1705 012002 ].

The 6-year cycle in the LOD is not aligned as strictly as the tidal model and it tends to wander, but it seems a more plausible and parsimonious explanation of the modulation than for example in this paper (where the 6-year LOD cycle is “similarly detected in the variations of C22 and S22, the degree-2 order-2 Stokes coefficients of the Earth’s gravitational field”).

Cross-validation confidence improves as the number of mutually agreeing alignments increase. Given the fact that controlled experiments are impossible to perform, this category of analyses is the best way to validate the geophysical models.


Continue reading