Overfitting+Cross-Validation: ENSO→AMO

I presented at the 2018 AGU Fall meeting on the topic of cross-validation. From those early results, I updated a fitted model comparison between the Pacific ocean’s ENSO time-series and the Atlantic Ocean’s AMO time-series. The premise is that the tidal forcing is essentially the same in the two oceans, but that the standing-wave configuration differs. So the approach is to maintain a common-mode forcing in the two basins while only adjusting the Laplace’s tidal equation (LTE) modulation.

If you don’t know about these completely orthogonal time series, the thought that one can avoid overfitting the data — let alone two sets simultaneously — is unheard of (Michael Mann doesn’t even think that the AMO is a real oscillation based on reading his latest research article called “Absence of internal multidecadal and interdecadal oscillations in climate model simulations“).

This is the latest product (click to expand)

Read this backwards from H to A.

H = The two tidal forcing inputs for ENSO and AMO — differs really only by scale and a slight offset

G = The constituent tidal forcing spectrum comparison of the two — primarily the expected main constituents of the Mf fortnightly tide and the Mm monthly tide (and the Mt composite of Mf × Mm), amplified by an annual impulse train which creates a repeating Brillouin zone in frequency space.

E&F = The LTE modulation for AMO, essentially comprised of one strong high-wavenumber modulation as shown in F

C&D = The LTE modulation for ENSO, a strong low-wavenumber that follows the El Nino La Nina cycles and then a faster modulation

B = The AMO fitted model modulating H with E

A = The ENSO fitted model modulating the other H with C

Ordinarily, this would take eons worth of machine learning compute time to determine this non-linear mapping, but with knowledge of how to solve Navier-Stokes, it becomes a tractable problem.

Now, with that said, what does this have to do with cross-validation? By fitting only to the ENSO time-series, the model produced does indeed have many degrees of freedom (DOF), based on the number of tidal constituents shown in G. Yet, by constraining the AMO fit to require essentially the same constituent tidal forcing as for ENSO, the number of additional DOF introduced is minimal — note the strong spike value in F.

Since parsimony of a model fit is based on information criteria such as number of DOF, as that is exactly what is used as a metric characterizing order in the previous post, then it would be reasonable to assume that fitting a waveform as complex as B with only the additional information of F cross-validates the underlying common-mode model according to any information criteria metric.

For further guidance, this is an informative article on model selection in regards to complexity — “A Primer for Model Selection: The Decisive Role of Model Complexity

excerpt:

5 thoughts on “Overfitting+Cross-Validation: ENSO→AMO

  1. An idea for further cross-validation is to use the same forcing and fit the 1880-1950 ENSO data while simultaneously fitting the 1950-2020 AMO data. Then one can cross-validate the 1950-2020 ENSO and the 1880-1950 AMO.

    Like

  2. Nice post. How about doing a cross validation on 1880 to 1980 (or 2000) and then seeing how the model matches the data from 1981 to 2019 (or 2001 to 2019)?

    Like

    • Dennis, done that many times. It works very well but the issue is that the fitting process was biased by being exposed to that interval beforehand, so it has memory of the good fit in the chosen starting parameters. Conversely if one starts from scratch on that interval, it may overfit. But since you mention it, I may try it again from scratch.

      The simultaneous fitting of ENSO and AMO fixes the overfitting problem (see first comment) as both can be done from scratch.

      Paul

      Like

      • What the primary constituent tide looks like

        The green curve is the residual that includes nodal 18.6y factors, otherwise it looks aligned to the 8.85y perigee cycle.

        I tried fitting AMO from scratch to 1980 as a training interval, and this is as far as it got

        It got stuck in a local optimum and so wouldn’t take any of the minor tidal factors, which are critical for improving the fit.

        Like

  3. more here
    https://forum.azimuthproject.org/discussion/comment/22785/#Comment_22785

    And a new Michael Mann paper on AMO in Science

    The climate scientist Michael Mann published a Science article this month whereby his research team claims that AMO isn’t even an oscillation —
    Multidecadal climate oscillations during the past millennium driven by volcanic forcing

    “The Atlantic Multidecadal Oscillation (AMO), a 50- to 70-year quasiperiodic variation of climate centered in the North Atlantic region, was long thought to be an internal oscillation of the climate system. Mann et al. now show that this variation is forced externally by episodes of high-amplitude explosive volcanism.”

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s