# Machine Learning of ENSO

This topic will gain steam in the coming years. The following paper generates quite a good cross-validation for SOI, shown in the figure below.

1. Xiaoqun, C. et al. ENSO prediction based on Long Short-Term Memory (LSTM). IOP Conference Series: Materials Science and Engineering, 799, 012035 (2020).

The x-axis appears to be in months and likely starts in 1979, so it captures the 2016 El Nino (El Nino is negative for SOI). Still have no idea how the neural net arrived at the fit other than it being able to discern the cyclic behavior from the historical waveform between 1979 and 2010. From the article itself, it appears that neither do the authors.

# Combinatorial Tidal Constituents

For the tidal forcing that contributes to length-of-day (LOD) variations [1], only a few factors contribute to a plurality of the variation. These are indicated below by the highlighted circles, where the V0/g amplitude is greatest. The first is the nodal 18.6 year cycle, indicated by the N’ = 1 Doodson argument. The second is the 27.55 day “Mm” anomalistic cycle which is a combination of the perigean 8.85 year cycle (p = -1 Doodson argument) mixed with the 27.32 day tropical cycle (s=1 Doodson argument). The third and strongest is twice the tropical cycle (therefore s=2) nicknamed “Mf”.

These three factors also combine as the primary input forcing to the ENSO model. Yet, even though they are strongest, the combinatorial factors derived from multiplying these main harmonics are vital for generating a quality fit (both for dLOD and even more so for ENSO). What I have done in the past was apply the recommended mix of first- and second-order factors that appear in the dLOD spectra for the ENSO forcing.

Yet there is another approach that makes no assumption of the strongest 2nd-order factors. In this case, one simply expands the primary factors as a combinatorial expansion of cross-terms to the 4th level — this then generates a broad mix of monthly, fortnightly, 9-day, and weekly harmonic cycles. A nested algorithm to generate the 35 constituent terms is :

$(1+s+p+N')^4$

```Counter := 1;
for J in Constituents'Range loop
for K in Constituents'First .. J loop
for L in Constituents'First .. K loop
for M in Constituents'First .. L loop
Tf := Tf + Coefficients (Counter) * Fundamental(J) *
Fundamental(K) * Fundamental(L) * Fundamental (M);
Counter := Counter + 1;
end loop;
end loop;
end loop;
end loop;```

This algorithm requires the three fundamental terms plus one unity term to capture most of the cross-terms shown in Table 3 above (The annual cross-terms are automatic as those are generated by the model’s annual impulse). This transforms into a coefficients array that can be included in the LTE search software.

What is missing from the list are the evection terms corresponding to 31.812 (Msm) and 27.093 day cycles. They are retrograde to the prograde 27.55 day anomalistic cycle, so would need an additional 8.848 year perigee cycle bring the count from 3 fundamental terms to 4.

The difference between adding an extra level of harmonics, bringing the combinatorial total from 35 to 126, is not very apparent when looking at the time series (below), as it simply adds shape to the main fortnightly tropical cycle.

Yet it has a significant effect on the ENSO fit, approaching a CC of 0.95 (see inset at right for the scatter correlation). Note that the forcing frequency spectra in the middle right inset still shows a predominately tropical fortnightly peak at 0.26/yr and 0.74/yr.

These extra harmonics also helps in matching to the much more busy SOI time-series. Click on the chart below to inspect how the higher-K wavenumbers may be the origin of what is thought to be noise in the SOI measurements.

Is this a case of overfitting? Try the following cross-validation on orthogonal intervals, and note how tight the model matches the data to the training intervals, without degrading too much on the outer validation region.

I will likely add this combinatorial expansion approach to the LTE fitting software on GitHub soon, but thought to checkpoint the interim progress on the blog. In the end the likely modeling mix will be a combination of the geophysical calibration to the known dLOD response together with a refined collection of these 2nd-order combinatorial tidal constituents. The rationale for why certain terms are important will eventually become more clear as well.

### References

1. Ray, R.D. and Erofeeva, S.Y., 2014. Long‐period tidal variations in the length of day. Journal of Geophysical Research: Solid Earth119(2), pp.1498-1509.

# El Nino Modoki

In Chapter 12 of the book we model — via LTE — the canonical El Nino Southern Oscillation (ENSO) behavior, fitting to closely-correlated indices such as NINO3.4 and SOI. Another El Nino index was identified circa 2007 that is not closely correlated to the well-known ENSO indices. This index, referred to as El Nino Modoki, appears to have more of a Pacific Ocean centered dipole shape with a bulge flanked by two wing lobes, cycling again as an erratic standing-wave.

If in fact Modoki differs from the conventional ENSO only by a different standing-wave wavenumber configuration, then it should be straightforward to model as an LTE variation of ENSO. The figure below is the model fitted to the El Nino Modoki Index (EMI) (data from JAMSTEC). The cross-validation is included as values post-1940 were used in the training with values prior to this used as a validation test.

The LTE modulation has a higher fundamental wavenumber component than ENSO (plus a weaker factor closer to a zero wavenumber, i.e. some limited LTE modulation as is found with the QBO model).

The input tidal forcing is close to that used for ENSO but appears to lead it by one year. The same strength ordering of tidal factors occurs, but with the next higher harmonic (7-day) of the tropical fortnightly 13.66 day tide slightly higher for EMI than ENSO.

The model fit is essentially a perturbation of ENSO so did not take long to optimize based on the Laplace’s Tidal Equation modeling software. I was provoked to run the optimization after finding a paper yesterday on using machine learning to model El Nino Modoki [1].

It’s clear that what needs to be done is a complete spatio-temporal model fit across the equatorial Pacific, which will be amazing as it will account for the complete mix of spatial standing-wave modes. Maybe in a couple of years the climate science establishment will catch up.

### References

[1] Pal, Manali, et al. “Long-Lead Prediction of ENSO Modoki Index Using Machine Learning Algorithms.” Scientific Reports, vol. 10, 2020, doi:10.1038/s41598-019-57183-3.

# MJO

The Madden-Julian Oscillation (MJO) is a climate index that captures tropical variability at a finer resolution (i.e. intra-annual) than the (inter-annual) ENSO index over approximately the same geographic region.  Since much of the MJO variability is observed as 30 to 60 day cycles (and these are traveling waves, not standing waves), providing MJO data as a monthly time-series will filter out the fast cycles. Still, it is interesting to analyze the monthly MJO data and compare/contrast that to ENSO. As a disclaimer, it is known that inter-annual variability of the MJO is partly linked to ENSO, but the following will clearly show that connection.

This is the fit of MJO (longitude index #1) using the ENSO model as a starting point (either the NINO34 or SOI works equally well).

The constituent temporal forcing factors for MJO and ENSO align precisely

This is not surprising because the monthly filtered MJO does show the same El Nino peaks at 1983, 1998, and 2016 as the ENSO time-series. The only difference is in the LTE spatial modulation applied during the fitting process, whereby the MJO has a stronger high-wavenumber factor than the ENSO time series.

This is the SOI fit over the same 1980+ interval as MJO, with an almost 0.6 correlation.

# AO

The Arctic Oscillation (AO) dipole has behavior that is correlated to the North Atlantic Oscillation (NAO) dipole.   We can see this in two ways. First, and most straight-forwardly, the correlation coefficient between the AO and NAO time-series is above 0.6.

Secondly, we can use the model of the NAO from the last post and refit the parameters to the AO data (data also here), but spanning an orthogonal interval. Then we can compare the constituent lunisolar factors for NAO and AO for correlation, and further discover that this also doubles as an effective cross-validation for the underlying LTE model (as the intervals are orthogonal).

Top panel is a model fit for AO between 1900-1950, and below that is a model fit for NAO between 1950-present. The lower pane is the correlation for a common interval (left) and for the constituent lunisolar factors for the orthogonal interval (right)

Only the anomalistic factor shows an imperfect correlation, and that remains quite high.

# NAO

The challenge of validating the models of climate oscillations such as ENSO and QBO, rests primarily in our inability to perform controlled experiments. Because of this shortcoming, we can either do (1) predictions of future behavior and validate via the wait-and-see process, or (2) creatively apply techniques such as cross-validation on currently available data. The first is a non-starter because it’s obviously pointless to wait decades for validation results to confirm a model, when it’s entirely possible to do something today via the second approach.

There are a variety of ways to perform model cross-validation on measured data.

In its original and conventional formulation, cross-validation works by checking one interval of time-series against another, typically by training on one interval and then validating on an orthogonal interval.

Another way to cross-validate is to compare two sets of time-series data collected on behaviors that are potentially related. For example, in the case of ocean tidal data that can be collected and compared across spatially separated geographic regions, the sea-level-height (SLH) time-series data will not necessarily be correlated, but the underlying lunar and solar forcing factors will be closely aligned give or take a phase factor. This is intuitively understandable since the two locations share a common-mode signal forcing due to the gravitational pull of the moon and sun, with the differences in response due to the geographic location and local spatial topology and boundary conditions. For tides, this is a consensus understanding and tidal prediction algorithms have stood the test of time.

In the previous post, cross-validation on distinct data sets was evaluated assuming common-mode lunisolar forcing. One cross-validation was done between the ENSO time-series and the AMO time-series. Another cross-validation was performed for ENSO against PDO. The underlying common-mode lunisolar forcings were highly correlated as shown in the featured figure.  The LTE spatial wave-number weightings were the primary discriminator for the model fit. This model is described in detail in the book Mathematical GeoEnergy to be published at the end of the year by Wiley.

Another common-mode cross-validation possible is between ENSO and QBO, but in this case it is primarily in the Draconic nodal lunar factor — the cyclic forcing that appears to govern the regular oscillations of QBO.  Below is the Draconic constituent comparison for QBO and the ENSO.

The QBO and ENSO models only show a common-mode correlated response with respect to the Draconic forcing. The Draconic forcing drives the quasi-periodicity of the QBO cycles, as can be seen in the lower right panel, with a small training window.

This cross-correlation technique can be extended to what appears to be an extremely erratic measure, the North Atlantic Oscillation (NAO).

Like the SOI measure for ENSO, the NAO is originally derived from a pressure dipole measured at two separate locations — but in this case north of the equator.  From the high-frequency of the oscillations, a good assumption is that the spatial wavenumber factors are much higher than is required to fit ENSO. And that was the case as evidenced by the figure below.

ENSO vs NAO cross-validation

Both SOI and NAO are noisy time-series with the NAO appearing very noisy, yet the lunisolar constituent forcings are highly synchronized as shown by correlations in the lower pane. In particular, summing the Anomalistic and Solar constituent factors together improves the correlation markedly, which is because each of those has influence on the other via the lunar-solar mutual gravitational attraction. The iterative fitting process adjusts each of the factors independently, yet the net result compensates the counteracting amplitudes so the net common-mode factor is essentially the same for ENSO and NAO (see lower-right correlation labelled Anomalistic+Solar).

Since the NAO has high-frequency components, we can also perform a conventional cross-validation across orthogonal intervals. The validation interval below is for the years between 1960 and 1990, and even though the training intervals were aggressively over-fit, the correlation between the model and data is still visible in those 30 years.

NAO model fit with validation spanning 1960 to 1990

Over the course of time spent modeling ENSO, the effort that went into fitting to NAO was a fraction of the original time. This is largely due to the fact that the temporal lunisolar forcing only needed to be tweaked to match other climate indices, and the iteration over the topological spatial factors quickly converges.

Many more cross-validation techniques are available for NAO, since there are different flavors of NAO indices available corresponding to different Atlantic locations, and spanning back to the 1800’s.

# ENSO, AMO, PDO and common-mode mechanisms

The basis of the ENSO model is the forcing derived from the long-period cyclic lunisolar gravitational pull of the moon and sun. There is some thought that ENSO shows teleconnections to other oceanic behaviors. The primary oceanic dipoles are ENSO and AMO for the Pacific and Atlantic. There is also the PDO for the mid-northern-latitude of the Pacific, which has a pattern distinct from ENSO. So the question is: Are these connected through interactions or do they possibly share a common-mode mechanism through the same lunisolar forcing mechanism?

Based on tidal behaviors, it is known that the gravitational pull varies geographically, so it would be understandable that ENSO, AMO, and PDO would demonstrate distinct time-series signatures. In checking this, you will find that the correlation coefficient between any two of these series is essentially zero, regardless of applied leads or lags. Yet the underlying component factors (the lunar Draconic, lunar Anomalistic, and solar modified terms) may potentially emerge with only slight variations in shape, with differences only in relative amplitude. This is straightforward to test by fitting the basic ENSO model to AMO and PDO by allowing the parameters to vary.

The following figure is the result of fitting the model to ENSO, AMO, and PDO and then comparing the constituent factors.

First, note that the same parametric model fits each of the time series arguably well. The Draconic factor underling both the ENSO and AMO model is almost perfectly aligned, indicated by the red starred graph, with excursions showing a CC above 0.99. All of the rest of the CC’s in fact are above 0.6.

The upshot of this analysis is two-fold. First to consider how difficult it is to fit any one of these time series to a minimal set of periodically-forced signals. Secondly that the underlying signals are not that different in character, only that the combination in terms of a Laplace’s tidal equation weighting are what couples them together via a common-mode mechanism. Thus, the teleconnection between these oceanic indices is likely an underlying common lunisolar tidal forcing, just as one would suspect from conventional tidal analysis.

# An obvious clue from tidal data

One of the interesting traits of climate science is the way it gives away obvious clues. This recent paper by Iz

Iz, H Bâki. “The Effect of Regional Sea Level Atmospheric Pressure on Sea Level Variations at Globally Distributed Tide Gauge Stations with Long Records.” Journal of Geodetic Science 8, no. 1 (n.d.): 55–71.
shows such a breathtakingly obvious characteristic that it’s a wonder why everyone isn’t all over it.  The author seems to be understating the feature, which is essentially showing that for certain tidal records, the atmospheric pressure (recorded in the tidal measurement location) is pseudo-quantized to a set of specific values.  In other words, for a New York City tidal gauge station, there are 12 values of atmospheric pressure between 1000 and 1035 mb that are heavily favored over all other values.
One can see it in the raw data here where clear horizontal lines are apparent in the data points:

Raw data for NYC station  (Iz, H Bâki. “The Effect of Regional Sea Level Atmospheric Pressure on Sea Level Variations at Globally Distributed Tide Gauge Stations with Long Records.” Journal of Geodetic Science 8, no. 1 (n.d.): 55–71.)

and for the transformed data shown in the histogram below, where I believe the waviness in the lines is compensated by fitting to long-period tidal signal factors (such as 18.6 year, 9.3 year periods, etc).

Histogram for transformed data for NYC station  Iz, H Bâki. “The Effect of Regional Sea Level Atmospheric Pressure on Sea Level Variations at Globally Distributed Tide Gauge Stations with Long Records.” Journal of Geodetic Science 8, no. 1 (n.d.): 55–71.

The author isn’t calling it a quantization, and doesn’t really call attention to it with a specific name other than clustering, yet it is obvious from the raw data and even more from the histograms of the transformed data.

The first temptation is to attribute the pattern to a measurement artifact. These are monthly readings and there are 12 separate discrete values identified so that connection seems causal. The author says

“It was shown that random component of regional atmospheric pressure tends to cluster at monthly intervals. The clusters are likely to be caused by the intraannual seasonal atmospheric temperature changes, which may also act as random beats in generating sub-harmonics observed in sea level changes as another mechanism.”
Nearer the equator, the pattern is not readily evident. The fundamental connection between tidal value and atmospheric pressure is due to the inverse barometric effect
“At any fixed location, the sea level record is a function of time, involving periodic components as well as continuous random fluctuations. The periodic motion is mostly due to the gravitational effects of the sun-earth-moon system as well as because of solar radiation upon the atmosphere and the ocean as discussed before. Sometimes the random fluctuations are of meteorological origin and reflect the effect of ’weather’ upon the sea surface but reflect also the inverse barometric effect of atmospheric pressure at sea level.”
So the bottom-line impact is that the underlying tidal signal is viably measured even though it is at a monthly resolution and not the diurnal or semi-diurnal resolution typically associated with tides.
Why this effect is not as evident closer to the equator is rationalized by smaller annual amplification
“Stations closer to the equator are also exposed to yearly periodic variations but with smaller amplitudes. Large adjusted R2 values show that the models explain most of the variations in atmospheric pressure  observed at the sea level at the corresponding stations. For those stations closer to the equator, the amplitudes of the annual and semiannual changes are considerably smaller and overwhelmed by random excursions. Stations in Europe experience similar regional variations because of their proximities to each other”
So, for the Sydney Harbor tidal data the pattern is not observed

Sydney histogram does not show a clear delineated quantization

Whereas, I previously showed the clear impact of the ENSO signal on the Sydney tidal data after a specific transform in this post. The erratic ENSO signal (with a huge inverse barometric effect as measured via the SOI readings of atmospheric pressure) competes with the annual signal so that the monthly quantization is obscured. Yet, if the ENSO behavior is also connected to the tidal forcing at these long-period levels, there may be a tidal unification yet to be drawn from these clues.

# Duration of a year

As the ENSO model demonstrates a continuous fit from the proxy records starting in 1650 on through the instrumental records starting in 1880, it has become important to keep accurate tabs of the year length. The quality of the model agreement appears to be sensitive to the average year duration as well as position of the century leap years.

From 1880 to 2018, the actual number of tabulated days is 134,409 while the model uses 134,408.5, with an extra day inserted where the proxy and instrument data meet. So overall the number of days is 134,409.5. This is very close to the actual, and, further, it looks necessary for precision in terms of the long-period tidal analysis metrology. If even one lunar month is missed, the phase interference will destroy the correlation.

There are leap days at years 1600 and 2000 (centuries divided by 400) and missing leap days at 1700, 1800, and 1900. It appears that the inserted day essentially acts as a bridge from slightly longer average days close to 1650 and those of the present day. Otherwise the average year length is closer to the long term average of 365.2422 days.

1600, 366 days
1700, 365
1800, 365
1900, 365
2000, 366

This is a good problem to have as it substantiates the validity of the model. In other words, if the data wasn’t sensitive to the precise year duration it wouldn’t be a tidally influenced mechanism.

# ENSO model verification via Fourier analysis infill

Because the ENSO model generates precise temporal harmonics via a non-linear solution to Laplace’s Tidal Equations, it may in practice be trivially easy to verify. By only using higher-frequency harmonics (T<1.25y) during spectral training (with a small window of low-frequency signal to stabilize the solution, T>11y), the model essentially fills in the missing bulk of the signal frequency spectrum, 1.25y < T < 11y.  This is shown below in Figure 1.

Fig. 1: Bottom panel of amplitude ENSO SOI spectra shows the training windows.  A primarily low-amplitude spectral signal is used to fit the model (using least-squares on the error signal). Upper spectra shows the expanded view of the out-of-band fit. This rich spectra is all due to the non-linear harmonic solution of the ENSO Laplace’s Tidal Equation solution.

This agreement is statistically unlikely (nee impossible) to occur unless the out-of-band signal had knowledge of the fundamental harmonics (i.e. the highest amplitude terms in the meat of the spectra) that are contributing to the higher harmonics.

Figure 2 is the underlying temporal fit. Although not as good a fit as what we can achieve using more of the primary Fourier terms, it is still striking.

Fig. 2: Temporal model fit using only Fourier frequency terms shorter than 1.25 years and longer than 11 years. The correlation coefficient is 0.7 here

The consensus claim is that ENSO is a chaotic process with no long-term coherence. Yet, this shows excellent agreement with a forced lunisolar model showing very long-term coherence.   An issue to raise is: why has the obvious deterministic forcing model been abandoned as a plausible physical mechanism so long ago?