Simpler models can outperform deep learning at climate prediction

This article in MIT News:

https://news.mit.edu/2025/simpler-models-can-outperform-deep-learning-climate-prediction-0826

“New research shows the natural variability in climate data can cause AI models to struggle at predicting local temperature and rainfall.” … “While deep learning has become increasingly popular for emulation, few studies have explored whether these models perform better than tried-and-true approaches. The MIT researchers performed such a study. They compared a traditional technique called linear pattern scaling (LPS) with a deep-learning model using a common benchmark dataset for evaluating climate emulators. Their results showed that LPS outperformed deep-learning models on predicting nearly all parameters they tested, including temperature and precipitation.

Machine learning and other AI approaches such as symbolic regression will figure out that natural climate variability can be done using multiple linear regression (MLR) with cross-validation (CV), which is an outgrowth or extension of linear pattern scaling (LPS).

https://pukpr.github.io/results/image_results.html

When this was initially created on 9/1/2025, there were 3000 CV results on time-series
that averaged around 100 years (~1200 monthly readings/set) so over 3 million data points

In this NINO34 (ENSO) model, the test CV interval is shown as a dashed region

I developed this github model repository to make it easy to compare many different data sets, much better than using an image repository such as ImageShack.

There are about 130 sea-level height monitoring stations in the sites, which is relevant considering how much natural climate variation a la ENSO has an impact on monthly mean SLH measurements. See this paper Observing ENSO-modulated tides from space

“In this paper, we successfully quantify the influences of ENSO on tides from multi-satellite altimeters through a revised harmonic analysis (RHA) model which directly builds ENSO forcing into the basic functions of CHA. To eliminate mathematical artifacts caused by over-fitting, Lasso regularization is applied in the RHA model to replace widely-used ordinary least squares. “

Mathematical GeoEnergy 2018 vs ChatGPT 2025

On RealClimate.org

Paul Pukite (@whut) says

1 JUL 2025 AT 9:48 PM

Your comment is awaiting moderation.

“If so, do you have an explanation why the diurnal tides do not move the thermocline, whereas tides with longer periods do?”

The character of ENSO is that it shifts by varying amounts on an annual basis. Like any thermocline interface, it reaches the greatest metastability at a specific time of the year. I’m not making anything up here — the frequency spectrum of ENSO (pick any index NINO4, NINO34, NINO3) shows a well-defined mirror symmetry about the value 0.5/yr. Given that Incontrovertible observation, something is mixing with the annual impulse — and the only plausible candidate is a tidal force.
So the average force of the tides at this point is the important factor to consider. Given a very sharp annual impulse, the near daily tides alias against the monthly tides — that’s all part of mathematics of orbital cycles. So just pick the monthly tides as it’s convenient to deal with and is a more plausible match to a longer inertial push.

Sunspots are not a candidate here.

Some say wind is a candidate. Can’t be because wind actually lags the thermocline motion.

So the deal is, I can input the above as a prompt to ChatGPT and see what it responds with

https://chatgpt.com/share/68649088-5c48-8010-a767-4fe75ddfeffc

Chat GPT also produces a short Python script which generates the periodogram of expected spectral peaks.

I placed the results into a GitHub Gist here, with charts:
https://gist.github.com/pukpr/498dba4e518b35d78a8553e5f6ef8114

I made one change to the script (multiplying each tidal factor by its frequency to indicate its inertial potential, see the ## comment)

At the end of the Gist, I placed a representative power spectrum for the actual NINO4 and NINO34 data sets showing where the spectral peaks match. They all match. More positions match if you consider a biennial modulation as well.

Now, you might be saying — yes, but this all ChatGPT and I am likely coercing the output. Nothing of the sort. Like I said, I did the original work years ago and it was formally published as Mathematical Geoenergy (Wiley, 2018). This was long before LLMs such as ChatGPT came into prominence. ChatGPT is simply recreating the logical explanation that I had previously published. It is simply applying known signal processing techniques that are generic across all scientific and engineering domains and presenting what one would expect to observe.

In this case, it carries none of the baggage of climate science in terms of “you can’t do that, because that’s not the way things are done here”. ChatGPT doesn’t care about that prior baggage — it does the analysis the way that the research literature is pointing and how the calculation is statistically done across domains when confronted with the premise of an annual impulse combined with a tidal modulation. And it nailed it in 2025, just as I nailed it in 2018.

Reply

Thread on tidal modeling

Someone on Twitter suggested that tidal models are not understood “The tides connection to the moon should be revised.”. Unrolled thread after the “Read more” break

Continue reading

QBO: Pattern recognition and signal processing

TANSTAAFL: there ain’t no such thing as a free lunch … but there’s always crumbs for the taking.

Machine learning won’t necessarily make a complete discovery by uncovering some ground-breaking pattern in isolation, but more likely a fragment or clue or signature that could lead somewhere. I always remind myself that there are infinitely many more non-linear formulations than linear ones potentially lurking in nature, yet humans are poorly-equipped to solve most non-linear relationships. ML has started to look at the tip of the non-linear iceberg and humans have to be alert when it uncovers a crumb. Recall that pattern recognition and signal processing are well-established disciplines in their own right, yet consider the situation of searching for patterns in signals hiding in the data but unknown in structure. That’s often all we are looking for — some foot-hold to start from.

Continue reading

Teleconnection vs Common-Mode

A climate teleconnection is understood as one behavior impacting another — for example NINOx => AMO, meaning the Pacific ocean ENSO impacting the Atlantic ocean AMO via a remote (i.e. tele) connectiion. On the other hand, a common-mode behavior is a result of a shared underlying cause impacting a response in a uniquely parameterized fashion — for example NINOx = g(F(t), {n1, n2, n3, ...}) and AMO = g(F(t), {a1, a2, a3, ...}), where the n's are a set of constant parameters for NINOx and the a's are for AMO.

In this formulation F(t) is a forcing and g() is a transformation. Perhaps the best example of a common-mode response to a forcing is in the regional tidal response in local sea-level height (SLH). Obviously, the lunisolar forcing is a common mode in different regions and subtle variations in the parametric responses is required to model SLH uniquely. Once the parameters are known, one can make practical predictions (subject to recalibration as necessary).

Continue reading

Topology shapes climate dynamics

A paper from last week with high press visibility that makes claims to climate1 applicability is titled: Topology shapes dynamics of higher-order networks

The higher-order Topological Kuramoto dynamics, defined in Eq. (1), entails one linear transformation of the signal induced by a boundary operator, a non-linear transformation due to the application of the sine function, concatenated by another linear transformation induced by another boundary operator. These dynamical transformations are also at the basis of simplicial neural architectures, especially when weighted boundary matrices are adopted.

\dot{\theta}_i = \omega_i + \sum_{j} K_{ij} \sin(\theta_j - \theta_i) + F(t)

This may be a significant unifying model as it could resolve the mystery of why neural nets can fit fluid dynamic behaviors effectively without deeper understanding. In concise terms, a weighted sine function acts as a nonlinear mixing term in a NN and serves as the non-linear transformation in the Kuramoto model2.

Continue reading

Difference Model Fitting

By applying an annual impulse sample-and-hold on a common-mode basis set of tidal factors, a wide range of climate indices can be modeled and cross-validated. Whether it is a biennial impulse or annual impulse, the slowly modulating envelope is roughly the same, thus models of multidecadal indices such as AMO and PDO show similar skill — with cross validation results evaluated here for a biennial impulse. Now we will evaluate for annual impulse.

Continue reading

Google Gemini Deep Research on ENSO

First evaluation of Gemini Advanced 1.5 Pro with Deep Research. Logged in with trial subscription and gave this as an initial prompt. Saved the results to a Google Docs file, and then created the following PDF. Note the top-level focus on this blog and published citations in the scope of Chapter 12 of Mathematical Geoenergy, even though the chapter was not directly cited — only via an embedded cite in a submitted ESD Ideas article, reference 1. Interesting that the Lin & Qian paper not cited.

Prompt: Explain tidal forcing behind ENSO using derivations based on reduced effective gravity on equatorial thermocline.

Tidal Gauge Differential

A climate science breakthrough likely won’t be on some massive computation but on a novel formulation that exposes some fundamental pattern (perhaps discovered by deep mining during a machine learning exercise). Over 10 years ago, I wrote on a blog post on how one can extract the ENSO signal by doing simple signal processing on a sea-level height (SLH) tidal time-series — in this case, at Fort Denison located in Sydney harbor.

The formulation/trick is to take the difference between the SLH reading and that from 2 years (24 months) prior, described here

Check the recent blog post Lunar Torque Controls All for context of how it fits in to the unified model.

The rationale for this 24 month difference is likely related to the sloshing of the ocean triggered on an annual basis. I think this is a pattern that any ML exercise would find with very little effort. After all, it didn’t take me that long to find it. But the point is that the ML configuration has to be open and flexible enough to be able to search, generate, and test for the same formulation. IOW, it may not find it if the configuration, perhaps focused on computationally massive PDEs, is too narrow. That was my comment to a RC post on applying machine learning to climate science, see the following link and subsequent quote:

Nick McGreivy commented on:

“ML-based parameterizations have to work well for thousands of years of simulations, and thus need to be very stable (no random glitches or periodic blow-ups) (harder than you might think). Bias corrections based on historical observations might not generalize correctly in the future.”

This same issue arises when using ML to simulate PDEs. The solution is to analytically calculate what the stability condition(s) is (are), then at each timestep to add some numerical diffusion that nudges the solution towards satisfying the stability condition(s). I imagine this same technique could be used for ML-based parametrizations.

QBO Metrics

In addition to the standard correlation coefficient (CC) and RMS error, non-standard metrics that have beneficial cross-validation properties include dynamic time warp (DTW), complexity invariant-distance (CID) see [2], and a CID-modified DTW. The link above describes my implementation of the DTW metric but I have yet to describe the CID metric. It’s essentially the CC multiplied by a factor that empirically adjusts the embedded summed distance between data points (i.e. the stretched length) of the time-series so that the signature or look of two time-series visually match in complexity.

   CID = CC * min(Length(Model, Data))/ max(Length(Model, Data))

The authors of the CID suggest that it’s a metric based on “an invariance that the community seems to have missed”.

And a CID-modified DTW is thus:

CID = DTW * min(Length(Model, Data))/ max(Length(Model, Data))

I have tried this on the QBO model with good cross-validation results featuring up to-data data from https://www.atmohub.kit.edu/data/qbo.dat

These have similar tidal factor compositions and differ mainly in the LTE modulation and phase delay. As discussed earlier, any anomalies in the QBO behavior are likely the outcome of an erratic periodicity caused by incommensurate annual and draconic cycles and exaggerated by LTE.

from https://gist.github.com/pukpr/e562138af3a9da937a3fb6955685c98f

REFERENCES

[1] Batista, Gustavo EAPA, et al. “CID: an efficient complexity-invariant distance for time series.” Data Mining and Knowledge Discovery 28 (2014): 634-669.R
https://link.springer.com/article/10.1007/s10618-013-0312-3