The model for ENSO includes a nonlinear search feature that finds the best-fit tidal forcing parameters. This is similar to what a conventional ocean tidal analysis program performs — finding the best-fitting lunar tidal parameters based on a measured historic interval of hundreds of cycles. Since tidal cycles are abundant — occurring at least once per day — it doesn’t take much data collected over a course of time to do an analysis. In contrast, the ENSO model cycles over the course of years, so we have to use as much data as we can, yet still allow test intervals.
What follows is the recipe (more involved than the short recipe) that will guarantee a deterministic best-fit from a clean slate each time. Very little initial condition information is needed to start with, so that the final result can be confidently recovered each time, independent of training interval.
For the ENSO model, a few parameters are required for seeding. The seasonal impulse is always fixed to be around mid-November. And the biennial Mathieu modulation is set to 0.25 with a peak at the same point in mid-November. The anomalistic lunar variation is set according to the previous post. For the differential equation, the lag derivative is set to 0.8 (i.e. 0.8 of the previous month’s value becomes the current value) and the positive feedback from the previous year is set to -0.15 (i.e. last year’s value at the current time is fed with a reverse sign to promote a see-saw effect).
The most convenient training interval is to split the ENSO time-series exactly in half and use the other half as the test interval.
From this point, the non-linear solver is allowed to vary the phases and amplitudes of the Draconic and Anomalistic lunar monthly cycles to maximize the correlation coefficient: correl(Data[ ],Model[ ]). The nonlinear harmonics D2, D3, D4, and A3 are included but allowed to vary in only their amplitude since the phase is locked in to the fundamentals for D and A. None of the non-linear mixed terms are included yet. A simultaneous constraint is set so that the lunar factors also minimize the error to the known LOD variation. The initial seed for D and D2 is shown below to roughly match the LOD variations (see below the calibration interval in the bottom panel).
After running with this set, the solver comes close to finding a good fit, but the other harmonics need to be included. Before doing that, it’s best to do another solver run with a DC offset in the forcing and in the model output to adjust for any positive/negative imbalance in the waveform (this is done only at this point to prevent if from getting too large in the initial solution, as it is quite a small correction).
After including the next set of mixed harmonics (up to 4th order) to the solver set, run the solver again. This can be a slow process as more factors are included, depending on the speed of the computer running the solver.
This is an interim solution when using the higher (1950-2016) interval for training:
The correlation coefficient reaches 0.66 in the training interval and over 0.5 in the test, so one can see that the model behavior is emerging.
Finally, we can add the 5th order factors, including all these [ A, D, AD, A2, D2, A2D, AD2, A3, D3, A2D2, A3D, AD3, A4, D4, A2D3, A3D2, A4D1, A1D4, A5, D5 ]. The Mathieu modulation is allowed to vary as is the seasonal modulation position. The amplitude of the anomalistic period variation is also allowed to vary. This part can take quite a long time to converge to a solution, as the added factors need to be iteratively tweeked during the non-linear search process.
One trick in the final stages is to restart the solver after it converges. But before restarting, relax the amplitude of the model output so that it can do an extra gradient search to further refine the fit. Also, the higher-order mixed terms can be re-zeroed and then refit in case too much over-fitting occurs. This can be done a few times, and should actually be performed on the solution in the chart above. What can happen is that the solver finds a local valley that it is unable to migrate out of. As more parameters are added, more possibilities exist to get stuck and the solver exhausts combinations as it tries to get unstuck. In some ways, the fitting process is an art-form in itself, needing occasional input to get it unstuck and back on track. The rule to remember is that as in conventional tidal analysis, the lowest order terms will contribute most to the fit and the higher order terms fill in the details.
Next, try the complementary direction — i.e. zero the numbers and use the lower training interval (1880-1950). As a practical matter, using smaller training intervals, such as the interval from 1900 to 1920, works well in guiding the solution in the right direction. However, that will result in over-fitting, so the interval is widened after the initial run. Eventually, the parameter solution will be almost identical to the higher interval as shown earlier. And this solution is not tainted in any way, as reversing the training and test intervals contributes no prior bias.
For the differential equation solver, I used a simple Excel spreadsheet. In the past, I have used Mathematica to do the DiffEq integration integrated with a best-fit solver (note that Mathematica doesn’t allow a correlation coefficient as the objective target), but don’t have a license at the moment and using a spreadsheet demonstrates how low the mathematical hurdle is.