Cross-validation is essentially the ability to predict the characteristics of an unexplored region based on a model of an explored region. The explored region is often used as a training interval to test or validate model applicability on the unexplored interval. If some fraction of the expected characteristics appears in the unexplored region when the model is extrapolated to that interval, some degree of validation is granted to the model.
This is a powerful technique on its own as it is used frequently (and depended on) in machine learning models to eliminate poorly performing trials. But it gains even more importance when new data for validation will take years to collect. In particular, consider the arduous process of collecting fresh data for El Nino Southern Oscillation, which will take decades to generate sufficient statistical significance for validation.
So, what’s necessary in the short term is substantiation of a model’s potential validity. Nothing else will work as a substitute, as controlled experiments are not possible for domains as large as the Earth’s climate. Cross-validation remains the best bet.
NdGT has a point — you do see the earth’s shadow moving across the moon, but once covered, a #lunarEclipse just looks like a duller moon (similar “new moons” are also observed like clockwork and thus take the excitement out of it). Yet the alignment of tidal forces does a number on the Earth’s climate that is totally cryptic and thus overlooked. Perhaps old Dr. Neil would find more interesting tying lunar cycles to climate indices such as ENSO and the Indian Ocean Dipole? It’s all based on geophysical fluid dynamics. Oh, and a bonus — discriminate on the variability of IOD and there’s the underlying AGW trend!
BTW, a key to this IOD model fit is to apply dual annual impulses, one for each monsoon season, summer and winter. Whereas, ENSO only has the spring predictability barrier.
The premise of the paper is that the ocean will show modulation of mixing with a cycle of ~18 years corresponding to the 18.6-year lunar declination cycle. That may indeed be the case, but it likely pales in comparison to the other so-called long-period tidal cycles. In particular, every ~2 weeks the moon makes a complete north-south-north declination cycle that likely has a huge impact on the climate as it sloshes the subsurface thermocline (cite the paper by Lin & Qian1). Unfortunately, this much shorter cycle is not directly observed in the observational data, making it a challenge to determine how the pattern manifests itself. In the following, I will describe how this is accomplished, referring to the complete derivation found in Chapter 12 of Mathematical Geoenergy2.
Consider that the 2-week lunar declination cycle is observed very clearly in the Earth’s rotational speed, measured in terms of small transient changes in the length of day (LOD). From the IERS site, we can plot the differential LOD (dLOD) and fit to the known tidal factors, leaving a clean closed-form signal that one can use as a forcing function to evaluate the ocean response, in this case comparing it to the well-defined ENSO climate index.
The 18.6-year nodal cycle can be seen in the modulation of the cyclic dLOD data. At a higher resolution, the comparison is as follows:
To do that, we first make the assumption that the tidal cycle is modulated on an annual cycle, corresponding to the well-known “spring predictability barrier”. So, by integrating a sequence of May impulses against the value of the tidal forcing at that point, the following time series is generated.
Obviously, this does not match the ENSO NINO34 signal, but assuming that the subsurface response is non-linear (derivation in cite #2 below) and creates standing wave-modes based on the geometry of the ocean basin, then one can use a suitable transformation to potentially extract the pattern. The best approach based on the solution to the shallow-water wave model (i.e. Laplace’s Tidal Equations) is to map the input forcing (graph above) to the output corresponding to the NINO34 index, using a Fourier series expansion.
The result is the Laplace’s Tidal Equation (LTE) modulation spectra, shown below in a particular cross-validation configuration. Here, the NINO34 data is split into 2 halves, one time-series taken from 1870-1945 and the second from 1945-2020. The spectra were calculated individually and then multiplied point-by-point to identify long-lived stationary standing-wave nodes in the modulation. Thus, it isolates modulations that are common to each interval.
This is a log-plot, so the peak excursions shown are statistically significant and so can be modeled by a handful of quantifiable standing-wave modulations. The lowest wavenumber modulations are associated with the ENSO dipole modes and the higher wavenumber modulations are potentially associated with tropical instability waves (TIW)2.
As a final step, by applying this set of modulations to the lunisolar forcing (the blue chart above), a fit to the NINO34 time-series results. The chart shown below is a very good fit and can be cross-validated via several approaches10.
The mix of incommensurate tidal factors, the annual impulse, and a nonlinear response function is what causes the highly erratic nature of the ENSO waveform. It is neither chaotic nor random, as some researchers claim but instead is deterministically tied to the tidal and annual cycles, much like conventional tidal cycles have proven over the course of time.
To further quantify the decomposition of the tidal factors that force both the dLOD and the sloshing ENSO response, the paper by Ray and Erofeeva is vital8. When trying to understand the assignment of frequencies, note that after the annual impulse is applied, the known tidal factors corresponding to such tidal factors labelled Mf, Mm, etc get shifted from normal positions due to signal aliasing (see chart below in gray). This is a confusing factor to those who have not encountered aliasing before. As an example, the long-term modulation (>100 years) displayed in the blue chart above is due to the aliased 9.133 day Mt tidal factor, which almost synchronizes with the annual cycle, but the amount it is off leads to a gradual modulation in the forcing — so overall confusing in that a 9 day cycle could cause multidecadal changes.
Ding & Chao9 provide an independent analysis of LOD that provides a good cross-check to the non-aliased cross-factors. It may be possible to use lunar ephemeris data to calibrate the forcing but that adds degrees-of-freedom that could lead to over-fitting 10.
The reason that Lin & Qian were not able to further substantiate their claim of tidal forcing lies in that they could not associate the seasonal aliasing and a nonlinear mapping against their observations, only able to demonstrate the cause and effect of tidal forcing on the thermocline and thereby ruling out wind forcing. Other sources to cite are “Topological origin of equatorial waves” 4 and “Solar System Dynamics and Multiyear Droughts of the Western USA” 5, the latter discussing the impact of axial torques on the climate. Researchers at NASA JPL including J.H. Shirley, C. Perigaud6, and S.L. Marcus7 have touched on the LOD, lunar, ENSO connection over the years.
Bottom-line take aways :
Tidal factors are numerous so a measure such as dLOD is critical for calibrating the forcing.
Use the knowledge of a seasonal impulse, a la the spring predictability barrier, to advantage, while considering the temporal aliasing that it will cause.
The solution to the geophysical fluid dynamics produces a non-linear response, so clever transform techniques such as Fourier series are useful to isolate the pattern.
Ding, H., & Chao, B. F. (2018). Application of stabilized AR-z spectrum in harmonic analysis for geophysics. Journal of Geophysical Research: Solid Earth, 123, 8249– 8259. https://doi.org/10.1029/2018JB015890
In an earlier post, the observation was that ENSO models may not be unique due to the numerous possibilities provided by nonlinear math. This was supported by the fact that a tidal forcing model based on the Mf (13.66 day) tidal factor worked equally as well as a Mm (27.55 day) factor. This was not surprising considering that the aliasing against an annual impulse gave a similar repeat cycle — 3.8 years versus 3.9 years. But I have also observed that mixing the two in a linear fashion did not improve the fit much at all, as the difference created a long interference cycle which isn’t observed in the ENSO time series data. But then thinking in terms of the nonlinear modulation required, it may be that the two factors can be combined after the LTE solution is applied.
As the quality of the tidally-forced ENSO model improves, it’s instructive to evaluate its common-mode mechanism against other oceanic indices. So this is a re-evaluation of the Pacific Decadal Oscillation (PDO), in the context of non-autonomous solutions such as generated via LTE modulation. In particular, in this note we will clearly delineate the subtle distinction that arises when comparing ENSO and PDO. As background, it’s been frequently observed and reported that the PDO shows a resemblance to ENSO (a correlation coefficient between 0.5 and 0.6), but also demonstrates a longer multiyear behavior than the 3-7 year fluctuating period of ENSO, hence the decadal modifier.
A hypothesis based on LTE modulation is that decadal behavior arises from the shallowest modulation mode, and one that corresponds to even symmetry (i.e. cos not sin). So for a model that was originally fit to an ENSO time-series, it is anticipated that the modulation trending to a more even symmetry will reveal less rapid fluctuations — or in other words for an even f(x) = f(-x) symmetry there will be less difference between positive and negative excursions for a well-balanced symmetric input time-series. This should then exaggerate longer term fluctuations, such as in PDO. And for odd f(x) = -f(-x) symmetry it will exaggerate shorter term fluctuations leading to more spikiness, such as in ENSO.
This blog is late to the game in commenting on the physics of the Hollywood film Moonfall — but does that really matter? Geophysics research and glacially slow progress seem synonymous at this point. In social media, unless one jumps on the event of the day within an hour, it’s considered forgotten. However, difficult problems aren’t unraveled quickly, and that’s what he have when we consider the Moon’s influence on the Earth’s geophysics. Yes, tides are easy to understand, but any other impact of the Moon is considered warily, perhaps over the course of decades, not as part of the daily news & entertainment cycle.
My premise: The movie Moonfall is a more pure climate-science-fiction film than Don’t Look Up. Discuss.
Revisiting earlier modeling of the North Atlantic Oscillation (NAO) and Arctic Oscillation (AO) indices with the benefit of updated analysis approaches such as negative entropy. These two indices in particular are intimidating because to the untrained eye they appear to be more noise than anything deterministically periodic. Whereas ENSO has periods that range from 3 to 7 years, both NAO and AO show rapid cycling often on a faster-than-annual pace. The trial ansatz in this case is to adopt a semi-annual forcing pattern and synchronize that to long-period lunar factors, fitted to a Laplace’s Tidal Equation (LTE) model.
Start with candidate forcing time-series as shown below, with a mix of semi-annual and annual impulses modulating the primarily synodic/tropical lunar factor. The two diverge slightly at earlier dates (starting at 1880) but the NAO and AO instrumental data only begins at the year 1950, so the two are tightly correlated over the range of interest.
The intensity spectrum is shown below for the semi-annual zone, noting the aliased tropical factors at 27.32 and 13.66 days standing out.
The NAO and AO pattern is not really that different, and once a strong LTE modulation is found for one index, it also works for the other. As shown below, the lowest modulation is sharply delineated, yet more rapid than that for ENSO, indicating a high-wavenumber standing wave mode in the upper latitudes.
The model fit for NAO (data source) is excellent as shown below. The training interval only extended to 2016, so the dotted lines provide an extrapolated fit to the most recent NAO data.
Same for the AO (data source), the fit is also excellent as shown below. There is virtually no difference in the lowest LTE modulation frequency between NAO and AO, but the higher/more rapid LTE modulations need to be tuned for each unique index. In both cases, the extrapolations beyond the year 2016 are very encouraging (though not perfect) cross-validating predictions. The LTE modulation is so strong that it is also structurally sensitive to the exact forcing.
Both NAO and AO time-series appear very busy and noisy, yet there is very likely a strong underlying order due to the fundamental 27.32/13.66 day tropical forcing modulating the semi-annual impulse, with the 18.6/9.3 year and 8.85/4.42 year providing the expected longer-range lunar variability. This is also consistent with the critical semi-annual impulses that impact the QBO and Chandler wobble periodicity, with the caveat that group symmetry of the global QBO and Chandler wobble forcings require those to be draconic/nodal factors and not the geographically isolated sidereal/tropical factor required of the North Atlantic.
It really is a highly-resolved model potentially useful at a finer resolution than monthly and that will only improve over time.
(as a sidenote, this is much better attempt at matching a lunar forcing to AO and jet-stream dynamics than the approach Clive Best tried a few years ago. He gave it a shot but without knowledge of the non-linear character of the LTE modulation required he wasn’t able to achieve a high correlation, achieving at best a 2.4% Spearman correlation coefficient for AO in his Figure 4 — whereas the models in this GeoenergyMath post extend beyond 80% for the interval 1950 to 2016! )
Climate scientists as a general rule don’t understand crystallography deeply (I do). They also don’t understand cryptography (that, I don’t understand deeply either). Yet, as the last post indicated, knowledge of these two scientific domains is essential to decoding dipoles such as the El Nino Southern Oscillation (ENSO). Crystallography is basically an exercise in signal processing where one analyzes electron & x-ray diffraction patterns to be able to decode structure at the atomic level. It’s mathematical and not for people accustomed to existing outside of real space, as diffraction acts to transform the world of 3-D into a reciprocal space where the dimensions are inverted and common intuition fails.
Cryptography in its common use applies a key to enable a user to decode a scrambled data stream according to the instruction pattern embedded within the key. If diffraction-based crystallography required a complex unknown key to decode from reciprocal space, it would seem hopeless, but that’s exactly what we are dealing with when trying to decipher climate dipole time-series -— we don’t know what the decoding key is. If that’s the case, no wonder climate science has never made any progress in modeling ENSO, as it’s an existentially difficult problem.
The breakthrough is in identifying that an analytical solution to Laplace’s tidal equations (LTE) provides a crystallography+cryptography analog in which we can make some headway. The challenge is in identifying the decoding key (an unknown forcing) that would make the reciprocal-space inversion process (required for LTE demodulation) straightforward.
According to the LTE model, the forcing has to be a combination of tidal factors mixed with a seasonal cycle (stages 1 & 2 in the figure above) that would enable the last stage (Fourier series a la diffraction inversion) to be matched to empirical observations of a climate dipole such as ENSO.
The forcing key used in an ENSO model was described in the last post as a predominately Mm-based lunar tidal factorization as shown below, leading to an excellent match to the NINO34 time series after a minimally-complex LTE modulation is applied.
Critics might say and justifiably so, that this is potentially an over-fit to achieve that good a model-to-data correlation. There are too many degrees of freedom (DOF) in a tidal factorization which would allow a spuriously good fit depending on the computational effort applied (see Reference 1 at the end of this post).
Yet, if the forcing key used in the ENSO model was reused as is in fitting an independent climate dipole, such as the AMO, and this same key required little effort in modeling AMO, then the over-fitting criticism is invalidated. What’s left to perform is finding a distinct low-DOF LTE modulation to match the AMO time-series as shown below.
This is an example of a common-mode cross-validation of an LTE model that I originally suggested in an AGU paper from 2018. Invalidating this kind of analysis is exceedingly difficult as it requires one to show that the erratic cycling of AMO can be randomly created by a few DOF. In fact, a few DOFs of sinusoidal factors to reproduce the dozens of AMO peaks and valleys shown is virtually impossible to achieve. I leave it to others to debunk via an independent analysis.
addendum: LTE modulation comparisons, essentially the wavenumber of the diffraction signal:
This is the forcing power spectrum showing the principal Mm tidal factor term at period 3.9 years, with nearly identical spectral profiles for both ENSO and AMO.
According to the precepts of cryptography, decoding becomes straightforward once one knows the key. Similarly, nature often closely guards its secrets, and until the key is known, for example as with DNA, climate scientists will continue to flounder.
Chao, B. F., & Chung, C. H. (2019). On Estimating the Cross Correlation and Least Squares Fit of One Data Set to Another With Time Shift. Earth and Space Science, 6, 1409–1415. https://doi.org/10.1029/2018EA000548 “For example, two time series with predominant linear trends (very low DOF) can have a very high ρ (positive or negative), which can hardly be construed as an evidence for meaningful physical relationship. Similarly, two smooth time series with merely a few undulations of similar timescale (hence low DOF) can easily have a high apparent ρ just by fortuity especially if a time shift is allowed. On the other hand, two very “erratic” or, say, white time series (hence high DOF) can prove to be significantly correlated even though their apparent ρ value is only moderate. The key parameter of relevance here is the DOF: A relatively high ρ for low DOF may be less significant than a relatively low ρ at high DOF and vice versa.“
Jialin Lin, associate professor of geography, has spent the last two decades tackling those challenges, and in the past two years, he’s had breakthroughs in answering two of forecasting’s most pernicious questions: predicting the shift between El Niño and La Niña and predicting which hurricanes will rapidly intensify.
Now, he’s turning his attention to creating more accurate models predicting global warming and its impacts, leading an international team of 40 climate experts to create a new book identifying the highest-priority research questions for the next 30-50 years.
Lin set out to create a model that could accurately identify ENSO shifts by testing — and subsequently ruling out — all the theories and possibilities earlier researchers had proposed. Then, Lin realized current models only considered surface temperatures, and he decided to dive deeper.
He downloaded 140 years of deep-ocean temperature data, analyzed them and made a breakthrough discovery.
“After 20 years of research, I finally found that the shift was caused by an ocean wave 100 to 200 meters down in the deep ocean,” Lin said, whose research was published in a Nature journal. “The propagation of this wave from the western Pacific to the eastern Pacific generates the switch from La Niña to El Niño.”
The wave repeatedly appeared two years before an El Niño event developed, but Lin went one step further to explain what generated the wave and discovered it was caused by the moon’s tidal gravitational force.
“The tidal force is even easier to predict,” Lin said. “That will widen the possibility for an even longer lead of prediction. Now you can predict not only for two years before, but 10 years before.”
Essentially, the idea is that these subsurface waves can in no way be caused by surface wind as the latter only are observed later (likely as an after-effect of the sub-surface thermocline nearing the surface and thus modifying the atmospheric pressure gradient). This counters the long-standing belief that ENSO transitions occur as a result of prevailing wind shifts.
The other part of the article concerns correlating hurricane intensification is also interesting.
Given two models of a physical behavior, the “better” model has the highest correlation (or lowest error) to the data and the lowest number of degrees of freedom (#DOF) in terms of tunable parameters. This ratio CC/#DOF of correlation coefficient over DOF is routinely used in automated symbolic regression algorithms and for scoring of online programming contests. A balance between a good error metric and a low complexity score is often referred to as a Pareto frontier.
So for modeling ENSO, the challenge is to fit the quasi-periodic NINO34 time-series with a minimal number of tunable parameters. For a 140 year fitting interval (1880-1920), a naive Fourier series fit could easily take 50-100 sine waves of varying frequencies, amplitudes, and phase to match a low-pass filtered version of the data (any high-frequency components may take many more). However that is horribly complex model and obviously prone to over-fitting. Obviously we need to apply some physics to reduce the #DOF.
Since we know that ENSO is essentially a model of equatorial fluid dynamics in response to a tidal forcing, all that is needed is the gravitational potential along the equator. The paper by Na  has software for computing the orbital dynamics of the moon (i.e. lunar ephemerides) and a 1st-order approximation for tidal potential:
The software contains well over 100 sinusoidal terms (each consisting of amplitude, frequency, and phase) to internally model the lunar orbit precisely. Thus, that many DOF are removed, with a corresponding huge reduction in complexity score for any reasonable fit. So instead of a huge set of factors to manipulate (as with many detailed harmonic tidal analyses), what one is given is a range (r = R) and a declination ( ψ=delta) time-series. These are combined in a manner following the figure from Na shown above, essentially adjusting the amplitudes of R and delta while introducing an additional tangential or tractional projection of delta (sin instead of cos). The latter is important as described in NOAA’s tide producing forces page.
Although I roughly calibrated this earlier  via NASA’s HORIZONS ephemerides page (input parameters shown on the right), the Na software allows better flexibility in use. The two calculations essentially give identical outputs and independent verification that the numbers are as expected.
As this post is already getting too long, this is the result of doing a Laplace’s Tidal Equation fit (adding a few more DOF), demonstrating that the limited #DOF prevents over-fitting on a short training interval while cross-validating outside of this band.
This low complexity and high accuracy solution would win ANY competition, including the competition for best seasonal prediction with a measly prize of 15,000 Swiss francs . A good ENSO model is worth billions of $$ given the amount it will save in agricultural planning and its potential for mitigation of human suffering in predicting the timing of climate extremes.
 Na, S.-H. Chapter 19 – Prediction of Earth tide. in Basics of Computational Geophysics (eds. Samui, P., Dixon, B. & Tien Bui, D.) 351–372 (Elsevier, 2021). doi:10.1016/B978-0-12-820513-6.00022-9.
 Pukite, P.R. et al “Ephemeris calibration of Laplace’s tidal equation model for ENSO” AGU Fall Meeting, 2018. doi:10.1002/essoar.10500568.1