Simpler models … alternate interval

continued from last post.

The last set of cross-validation results are based on training of held-out data for intervals outside of 0.6-0.8 (i.e. training on t<0.6 and t>0.8 of the data, which extends from t=0.0 to t=1.0 normalized). This post considers training on intervals outside of 0.3-0.6 — a narrower training interval and correspondingly wider test interval.

Stockholm, Sweden
Korsor, Denmark
Klaipeda, Lithuania
Continue reading

Simpler models … examples

continued from last post.

Each fitted model result shows the cross-validation results based on training of held-out data — i.e. training on only the intervals outside of 0.6-0.8 (i.e. training on t<0.6 and t>0.8 of the data, which extends from t=0.0 to t=1.0 normalized). The best results are for time-series that have 100 years or more worth of monthly data, so the held-out data is typically 20 years. There is no selection bias trickery here, as this is a collection of independent sites and nothing in the MLR fitting process is specific to an individual time-series. In the following, the collection of results starts with the Stockholm site in Sweden, keeping in mind that the dashed line in the charts indicates the test or validation interval.

I was recently in Stockholm, and this is a photo pointed toward the location of the measurement station, about 4000 feet away labeled by the marker on the right below:
Stockholm, Sweden
Korsor, Denmark
Klaipeda, Lithuania
Continue reading

Simpler models can outperform deep learning at climate prediction

This article in MIT News:

https://news.mit.edu/2025/simpler-models-can-outperform-deep-learning-climate-prediction-0826

“New research shows the natural variability in climate data can cause AI models to struggle at predicting local temperature and rainfall.” … “While deep learning has become increasingly popular for emulation, few studies have explored whether these models perform better than tried-and-true approaches. The MIT researchers performed such a study. They compared a traditional technique called linear pattern scaling (LPS) with a deep-learning model using a common benchmark dataset for evaluating climate emulators. Their results showed that LPS outperformed deep-learning models on predicting nearly all parameters they tested, including temperature and precipitation.

Machine learning and other AI approaches such as symbolic regression will figure out that natural climate variability can be done using multiple linear regression (MLR) with cross-validation (CV), which is an outgrowth or extension of linear pattern scaling (LPS).

https://pukpr.github.io/results/image_results.html

When this was initially created on 9/1/2025, there were 3000 CV results on time-series
that averaged around 100 years (~1200 monthly readings/set) so over 3 million data points

In this NINO34 (ENSO) model, the test CV interval is shown as a dashed region

I developed this github model repository to make it easy to compare many different data sets, much better than using an image repository such as ImageShack.

There are about 130 sea-level height monitoring stations in the sites, which is relevant considering how much natural climate variation a la ENSO has an impact on monthly mean SLH measurements. See this paper Observing ENSO-modulated tides from space

“In this paper, we successfully quantify the influences of ENSO on tides from multi-satellite altimeters through a revised harmonic analysis (RHA) model which directly builds ENSO forcing into the basic functions of CHA. To eliminate mathematical artifacts caused by over-fitting, Lasso regularization is applied in the RHA model to replace widely-used ordinary least squares. “