I recently archived the paper to ARXIV and submitted to PRL.
Get the paper here from ARXIV as a PDF.
The nearly year-long investigation is time-lined and outlined here.
The final model fit:
Thanks for the good comments!
One application of the model described below
After finishing the paper, I ran across an interesting study by Smith and Sardeshmukh  referenced at a climate denier site. (BTW, these sites are great places to discover things, in an #OwnGoal sense.)
The study uses a modified index for ENSO that combines the SOI measure and a sea-surface temperature (SST) to reduce the noise and thus extract the true ENSO signal as much as possible. They call this the “BEST” ENSO index (not to be confused with the Berkeley BEST temperature record). This model doesn’t really improve the sloshing model fit, as the filtered SOI that I have been using is close to what BEST provides. However, it does reveal some very interesting and likely intrinsic properties of the ENSO time-series and, in particular, what should be included in the signal versus what should be removed.
To give a spoiler, the BEST model relies heavily on SST — yet SST is sensitive to the transiently cooling after-effects of volcanic eruptions. Since a large eruption (such as Pinatubo in 1991) will release aerosol sulfates into the upper atmosphere that then reflect sunlight back to space, the SST will likely be suppressed from achieving maximum solar insolation for a few years. This is a separate effect from natural ENSO variations, which the sloshing model describes via periodic and quasi-periodic forcing.
Recall that the basic sloshing model is related to a wave equation. This equation includes a second derivative with respect to time, which describes a natural resonance of the ocean region. It also includes the forcing periodicities, F(t).
The machine learning tool Eureqa (http://beta.eureqa.com) can be configured to detect deviations from the wave equation, as I have used in the past. Since the undamped wave equation solves for the second derivative of a quantity with respect to the quantity itself, one can configure Eureqa to optimize against a set of unknown symbolic expressions that will either modify the quantity itself (ala the Mathieu or Hill equation) and/or provide a forcing. These symbolic expressions will be approximations of periodic measures such as QBO, etc. The following expression is entered into Eureqa along with the BEST time series starting from 1870.
D(ENSO, t, 2) = f(ENSO, t)
When the Eureqa machine learning is applied to the BEST data, a not-so-surprising insight is revealed. The significant volcanic eruption of Pinatubo in 1991 is not captured in the wave-equation expansion fit. The interval from 1992-1994 contains an abrupt SST cooling that does not fit with the gradual change of the differential wave equation formulation nor with the periodic forcing that spans the entire profile from 1880 to the present. In Figure 2 below, the residual error plot shows that the Pinatubo cooling spike is strikingly apparent. The specific residual value is well over twice the average residual noise detected elsewhere.
In other words, what this is telling us is that the quasi-periodic time series was transiently impacted in a way that the wave equation model was not able to fit effectively, unless it could somehow piecewise map to this interval. In terms of machine learning, an arbitrarily placed delta function is a costly degree-of-freedom and one with a significant complexity penalty attached.
Eureqa includes a nice feature that allows these suspect regions to be excluded from the longer time series. So even though a volcanic eruption could have compensated for the ENSO response via a reduced solar insolation, we apply the fitted function only outside this interval. Only after the fit is generated, can we then place a short transient profile to reverse compensate for the non-ENSO related cooling. This reduced solar insolation is up to 10% of the full solar as measured in Figure 3 (linked from NOAA here).
A similar effect occurs due to the El Chichon volcano of 1982. In Figure 1, you can see that a red region is filled in on the recovery of the El Nino of 1982. Red indicates regions where the model is outside the bounds of the SOI range. By compensating the transient cooling with a narrow window centered at between 1993 and 1994, the model gets back in track with the likely actual ENSO signal. Remember that these massive volcanic eruptions only generate nuisance noise that interferes with the signal that we are actually interested in. The El Chichon noise spike is not seen as clearly in Figure 2, because the Eureqa machine learning resynched the fit at 1981. This not only served to compensate for El Chchon, but also realigned due to the Pacific climate shift of 1976-1977. The details of the compensation regions are shown in Figure 4 below.
This kind of behavior is almost impossible to detect unless the model has demonstrated some skill in mapping the actual physical behavior elsewhere. These volcanic effects are actually quite limited in scope but since the model is achieving a good correlation over the entire range, it allows us to probe more effectively for discrepancies and anomalies. So what we are seeing in the model is a more accurate picture of the ENSO dynamics, and this model helps to discriminate features in the observations that have little to do with ENSO. It is possible that some of the more recent discrepancies can be explained by smaller scale volcanic eruptions, see here.
 Smith, C.A. and P. Sardeshmukh – The Effect of ENSO on the Intraseasonal Variance of Surface Temperature in Winter., International J. of Climatology, 2000, 20 1543-1557.