Hidden latent manifolds in fluid dynamics

The behavior of complex systems, particularly in fluid dynamics, is traditionally described by high-dimensional systems of equations like the Navier-Stokes equations. While providing practical applications as is, these models can obscure the underlying, simplified mechanisms at play. It is notable that ocean modeling already incorporates dimensionality reduction built in, such as through Laplace’s Tidal Equations (LTE), which is a reduced-order formulation of the Navier-Stokes equations. Furthermore, the topological containment of phenomena like ENSO and QBO within the equatorial toroid , and the ability to further reduce LTE in this confined topology as described in the context of our text Mathematical Geoenergy underscore the inherent low-dimensional nature of dominant geophysical processes. The concept of hidden latent manifolds posits that the true, observed dynamics of a system do not occupy the entire high-dimensional phase space, but rather evolve on a much lower-dimensional geometric structure—a manifold layer—where the system’s effective degrees of freedom reside. This may also help explain the seeming paradox of the inverse energy cascade, whereby order in fluid structures seems to maintain as the waves become progressively larger, as nonlinear interactions accumulate energy transferring from smaller scales.

Discovering these latent structures from noisy, observational data is the central challenge in state-of-the-art fluid dynamics. Enter the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm, pioneered by Brunton et al. . SINDy is an equation-discovery framework designed to identify a sparse set of nonlinear terms that describe the evolution of the system on this low-dimensional manifold. Instead of testing all possible combinations of basis functions, SINDy uses a penalized regression technique (like LASSO) to enforce sparsity, effectively winnowing down the possibilities to find the most parsimonious, yet physically meaningful, governing differential equations. The result is a simple, interpretable model that captures the essential physics—the fingerprint of the latent manifold. The SINDy concept is not that difficult an algorithm to apply as a decent Python library is available for use, and I have evaluated it as described here.

Applying this methodology to Earth system dynamics, particularly the seemingly noisy, erratic, and perhaps chaotic time series of sea-level variation and climate index variability, reveals profound simplicity beneath the complexity. The high-dimensional output of climate models or raw observations can be projected onto a model framework driven by remarkably few physical processes. Specifically, as shown in analysis targeting the structure of these time series, the dynamics can be cross-validated by the interaction of two fundamental drivers: a forced gravitational tide and an annual impulse.

The presence of the forced gravitational tide accounts for the regular, high-frequency, and predictable components of the dynamics. The annual impulse, meanwhile, serves as the seasonal forcing function, representing the integrated effect of large-scale thermal and atmospheric cycles that reset annually. The success of this sparse, two-component model—where the interaction of these two elements is sufficient to capture the observed dynamics—serves as the ultimate validation of the latent manifold concept. The gravitational tides with the integrated annual impulse are the discovered, low-dimensional degrees of freedom, and the ability of their coupled solution to successfully cross-validate to the observed, high-fidelity dynamics confirms that the complex, high-dimensional reality of sea-level and climate variability emerges from this simple, sparse, and interpretable set of latent governing principles. This provides a powerful, physics-constrained approach to prediction and understanding, moving beyond descriptive models toward true dynamical discovery.

An entire set of cross-validated models is available for evluation here: https://pukpr.github.io/examples/mlr/.

This is a mix of climate indices (the 1st 20) and numbered coastal sea-level stations obtained from https://psmsl.org/

https://pukpr.github.io/examples/map_index.html

  • nino34 — NINO34 (PACIFIC)
  • nino4 — NINO4 (PACIFIC)
  • amo — AMO (ATLANTIC)
  • ao — AO (ARCTIC)
  • denison — Ft Denison (PACIFIC)
  • iod — IOD (INDIAN)
  • iodw — IOD West (INDIAN)
  • iode — IOD East (INDIAN)
  • nao — NAO (ATLANTIC)
  • tna — TNA Tropical N. Atlantic (ATLANTIC)
  • tsa — TSA Tropical S. Atlantic (ATLANTIC)
  • qbo30 — QBO 30 Equatorial (WORLD)
  • darwin — Darwin SOI (PACIFIC)
  • emi — EMI ENSO Modoki Index (PACIFIC)
  • ic3tsfc — ic3tsfc (Reconstruction) (PACIFIC)
  • m6 — M6, Atlantic Nino (ATLANTIC)
  • m4 — M4, N. Pacific Gyre Oscillation (PACIFIC)
  • pdo — PDO (PACIFIC)
  • nino3 — NINO3 (PACIFIC)
  • nino12 — NINO12 (PACIFIC)
  • 1 — BREST (FRANCE)
  • 10 — SAN FRANCISCO (UNITED STATES)
  • 11 — WARNEMUNDE 2 (GERMANY)
  • 14 — HELSINKI (FINLAND)
  • 41 — POTI (GEORGIA)
  • 65 — SYDNEY, FORT DENISON (AUSTRALIA)
  • 76 — AARHUS (DENMARK)
  • 78 — STOCKHOLM (SWEDEN)
  • 111 — FREMANTLE (AUSTRALIA)
  • 127 — SEATTLE (UNITED STATES)
  • 155 — HONOLULU (UNITED STATES)
  • 161 — GALVESTON II, PIER 21, TX (UNITED STATES)
  • 163 — BALBOA (PANAMA)
  • 183 — PORTLAND (MAINE) (UNITED STATES)
  • 196 — SYDNEY, FORT DENISON 2 (AUSTRALIA)
  • 202 — NEWLYN (UNITED KINGDOM)
  • 225 — KETCHIKAN (UNITED STATES)
  • 229 — KEMI (FINLAND)
  • 234 — CHARLESTON I (UNITED STATES)
  • 245 — LOS ANGELES (UNITED STATES)
  • 246 — PENSACOLA (UNITED STATES)

Crucially, this analysis does not use the SINDy algorithm, but a much more basic multiple linear regression (MLR) algorithm predecessor, which I anticipate being adapted to SINDy as the model is further refined. Part of the rationale for doing this is to maintain a deep understanding of the mathematics, as well as providing cross-checking and thus avoiding the perils of over-fitting, which is the bane of neural network models.

Also read this intro level on tidal modeling, which may form the fundamental foundation for the latent manifold: https://pukpr.github.io/examples/warne_intro.html. The coastal station at Wardemunde in Germany along the Baltic sea provided a long unbroken interval of sea-level readings which was used to calibrate the hidden latent manifold that in turn served as a starting point for all the other models. Not every model works as well as the majority — see Pensacola for a sea-level site and and IOD or TNA for climate indices, but these are equally valuable for understanding limitations (and providing a sanity check against an accidental degeneracy in the model fitting process) . The use of SINDy in the future will provide additional functionality such as regularization that will find an optimal common-mode latent layer,.

Simpler models … alternate interval

continued from last post.

The last set of cross-validation results are based on training of held-out data for intervals outside of 0.6-0.8 (i.e. training on t<0.6 and t>0.8 of the data, which extends from t=0.0 to t=1.0 normalized). This post considers training on intervals outside of 0.3-0.6 — a narrower training interval and correspondingly wider test interval.

Stockholm, Sweden
Korsor, Denmark
Klaipeda, Lithuania
Continue reading

Simpler models … examples

continued from last post.

Each fitted model result shows the cross-validation results based on training of held-out data — i.e. training on only the intervals outside of 0.6-0.8 (i.e. training on t<0.6 and t>0.8 of the data, which extends from t=0.0 to t=1.0 normalized). The best results are for time-series that have 100 years or more worth of monthly data, so the held-out data is typically 20 years. There is no selection bias trickery here, as this is a collection of independent sites and nothing in the MLR fitting process is specific to an individual time-series. In the following, the collection of results starts with the Stockholm site in Sweden, keeping in mind that the dashed line in the charts indicates the test or validation interval.

I was recently in Stockholm, and this is a photo pointed toward the location of the measurement station, about 4000 feet away labeled by the marker on the right below:
Stockholm, Sweden
Korsor, Denmark
Klaipeda, Lithuania
Continue reading

Simpler models can outperform deep learning at climate prediction

This article in MIT News:

https://news.mit.edu/2025/simpler-models-can-outperform-deep-learning-climate-prediction-0826

“New research shows the natural variability in climate data can cause AI models to struggle at predicting local temperature and rainfall.” … “While deep learning has become increasingly popular for emulation, few studies have explored whether these models perform better than tried-and-true approaches. The MIT researchers performed such a study. They compared a traditional technique called linear pattern scaling (LPS) with a deep-learning model using a common benchmark dataset for evaluating climate emulators. Their results showed that LPS outperformed deep-learning models on predicting nearly all parameters they tested, including temperature and precipitation.

Machine learning and other AI approaches such as symbolic regression will figure out that natural climate variability can be done using multiple linear regression (MLR) with cross-validation (CV), which is an outgrowth or extension of linear pattern scaling (LPS).

https://pukpr.github.io/results/image_results.html

When this was initially created on 9/1/2025, there were 3000 CV results on time-series
that averaged around 100 years (~1200 monthly readings/set) so over 3 million data points

In this NINO34 (ENSO) model, the test CV interval is shown as a dashed region

I developed this github model repository to make it easy to compare many different data sets, much better than using an image repository such as ImageShack.

There are about 130 sea-level height monitoring stations in the sites, which is relevant considering how much natural climate variation a la ENSO has an impact on monthly mean SLH measurements. See this paper Observing ENSO-modulated tides from space

“In this paper, we successfully quantify the influences of ENSO on tides from multi-satellite altimeters through a revised harmonic analysis (RHA) model which directly builds ENSO forcing into the basic functions of CHA. To eliminate mathematical artifacts caused by over-fitting, Lasso regularization is applied in the RHA model to replace widely-used ordinary least squares. “

Thread on tidal modeling

Someone on Twitter suggested that tidal models are not understood “The tides connection to the moon should be revised.”. Unrolled thread after the “Read more” break

Continue reading

Teleconnection vs Common-Mode

A climate teleconnection is understood as one behavior impacting another — for example NINOx => AMO, meaning the Pacific ocean ENSO impacting the Atlantic ocean AMO via a remote (i.e. tele) connectiion. On the other hand, a common-mode behavior is a result of a shared underlying cause impacting a response in a uniquely parameterized fashion — for example NINOx = g(F(t), {n1, n2, n3, ...}) and AMO = g(F(t), {a1, a2, a3, ...}), where the n's are a set of constant parameters for NINOx and the a's are for AMO.

In this formulation F(t) is a forcing and g() is a transformation. Perhaps the best example of a common-mode response to a forcing is in the regional tidal response in local sea-level height (SLH). Obviously, the lunisolar forcing is a common mode in different regions and subtle variations in the parametric responses is required to model SLH uniquely. Once the parameters are known, one can make practical predictions (subject to recalibration as necessary).

Continue reading

Difference Model Fitting

By applying an annual impulse sample-and-hold on a common-mode basis set of tidal factors, a wide range of climate indices can be modeled and cross-validated. Whether it is a biennial impulse or annual impulse, the slowly modulating envelope is roughly the same, thus models of multidecadal indices such as AMO and PDO show similar skill — with cross validation results evaluated here for a biennial impulse. Now we will evaluate for annual impulse.

Continue reading

Lunar Torque Controls All

Mathematical Geoenergy

The truly massive scale in the motion of fluids and solids on Earth arises from orbital interactions with our spinning planet. The most obvious of these, such as the daily and seasonal cycles, are taken for granted. Others, such as ocean tides, have more complicated mechanisms than the ordinary person realizes (e.g. ask someone to explain why there are 2 tidal cycles per day). There are also less well-known motions, such as the variation in the Earth’s rotation rate of nominally 360° per day, which is called the delta in Length of Day (LOD), and in the slight annual wobble in the Earth’s rotation axis. Nevertheless, each one of these is technically well-characterized and models of the motion include a quantitative mapping to the orbital cycles of the Sun, Moon, and Earth. This is represented in the directed graph below, where the BLUE ovals indicate behaviors that are fundamentally understood and modeled via tables of orbital factors.

The cyan background represents behaviors that have a longitudinal dependence
(rendered by GraphViz
)

However, those ovals highlighted in GRAY are nowhere near being well-understood in spite of being at least empirically well-characterized via years of measurements. Further, what is (IMO) astonishing is the lack of research interest in modeling these massive behaviors as a result of the same orbital mechanisms as that which causes tides, seasons, and the variations in LOD. In fact, everything tagged in the chart is essentially a behavior relating to an inertial response to something. That something, as reported in the Earth sciences literature, is only vaguely described — and never as a tidal or tidal/annual interaction.

I don’t see how it’s possible to overlook such an obvious causal connection. Why would the forcing that causes a massive behavior such as tides suddenly stop having a connection to other related inertial behaviors? The answers I find in the research literature are essentially that “someone looked in the past and found no correlation” [1].

Continue reading

Order overrides chaos

Dimensionality reduction of chaos by feedbacks and periodic forcing is a source of natural climate change, by P. Salmon, Climate Dynamics (2024)

Bottom line is that a forcing will tend to reduce chaos by creating a pattern to follow, thus the terminology of “forced response”. This has implications for climate prediction. The first few sentences of the abstract set the stage:

The role of chaos in the climate system has been dismissed as high dimensional turbulence and noise, with minimal impact on long-term climate change. However theory and experiment show that chaotic systems can be reduced or “controlled” from high to low dimensionality by periodic forcings and internal feedbacks. High dimensional chaos is somewhat featureless. Conversely low dimensional borderline chaos generates pattern such as oscillation, and is more widespread in climate than is generally recognised. Thus, oceanic oscillations such as the Pacific Decadal and Atlantic Multidecadal Oscillations are generated by dimensionality reduction under the effect of known feedbacks. Annual periodic forcing entrains the El Niño Southern Oscillation.

In Chapters 11 and 12 in Pukite, P., Coyne, D., & Challou, D. (2019). Mathematical Geoenergy. John Wiley & Sons, I cited forcing as a chaos reducer:

It is well known that a periodic forcing can reduce the erratic fluctuations and uncertainty of a near‐chaotic response function (Osipov et al., 2007; Wang, Yang, Zhou, 2013).

But that’s just a motivator. Tides are the key, acting primarily on the subsurface thermocline. Salmon’s figure comparing the AMO to Barents sea subsurface temperature is substantiating in terms of linking two separated regions by something more than a nebulous “teleconnection”.

Likely every ocean index has a common-mode mechanism. The tidal forcing by itself is close to providing an external synchronizing source, but requires what I refer to as a LTE modulation to zero in on the exact forced response. Read the previous blog post to get a feel how this works:

As Salmon notes, it’s known at some level that an annual/seasonal impulse is entraining or synchronizing ENSO, and also likely PDO and AMO. The top guns at NASA JPL point out that the main lunisolar terms are at monthly, 206 day, annual, 3 year, and 6 year periods, and this is what is used to model the forcing, see the following two charts

Now note how the middle panel in each of the following modeled climate indices does not change markedly. The most challenging aspect is the inherent structural sensitivity of the manifold1 mapping involved in LTE modulation. As the Darwin fit shows, the cross-validation is better than it may appear, as the out-of-band interval does not take much of a nudge to become synchronized with the data. Note also that the multidecadal nature of an index such as AMO may be ephemeral — the yellow cross-validation band does show valleys in what appears to be a longer multidecadal trend, capturing the long-period variations in the tides when modulated by an annual impulse – biennial in this case.

Model config repo: https://gist.github.com/pukpr/3a3566b601a54da2724df9c29159ce16?permalink_comment_id=5108154#gistcomment-5108154


1 The term manifold has an interesting etymology. From the phonetics, it is close to pronounced as “many fold”, which is precisely what’s happening here — the LTE modulation can fold over the forcing input many times in proportion to the mode of the standing wave produced. So that a higher standing wave will have “many folds” in contrast to the lowest standing wave model. At the limit, the QBO with an ostensibly wavenumber=0 mode will have no folds and will be to first-order a pass-through linear amplification of the forcing, but with likely higher modes mixed in to give the time-series some character.

Common forcing for ocean indices

In Mathematical Geoenergy, Chapter 12, a biennially-impulsed lunar forcing is suggested as a mechanism to drive ENSO. The current thinking is that this lunar forcing should be common across all the oceanic indices, including AMO for the Atlantic, IOD for the Indian, and PDO for the non-equatorial north Pacific. The global temperature extreme of the last year had too many simultaneous concurrences among the indices for this not to be taken seriously.

NINO34

PDO

AMO

IOD – East

IOD-West

Each one of these uses a nearly identical annual-impulsed tidal forcing (shown as the middle green panel in each), with a 5-year window providing a cross-validation interval. So many possibilities are available with cross-validation since the tidal factors are essentially invariantly fixed over all the climate indices.

The approach follows 3 steps as shown below

The first step is to generate the long-period tidal forcing. I go into an explanation of the tidal factors selected in a Real Climate comment here.

Then apply the lagged response of an annual impulse, in this case alternating in sign every other year, which generates the middle panel in the flow chart schematic (and the middle panel in the indexed models above).

Finally, the Laplace’s Tidal Equation (LTE) modulation is applied, with the lower right corner inset showing the variation among indices. This is where the variability occurs — the best approach is to pick a slow fundamental modulation and generate only integer harmonics of this fundamental. So, what happens is that different harmonics are emphasized depending on the oceanic index chosen, corresponding to the waveguide structure of the ocean basin and what standing waves are maximally resonant or amplified.

Note that for a dipole behavior such as ENSO, the LTE modulation will be mirror-inverses for the maximally extreme locations, in this case Darwin and Tahiti

A machine learning application is free to scrape the following GIST GitHub site for model fitting artifacts.

https://gist.github.com/pukpr/3a3566b601a54da2724df9c29159ce16

Another analysis that involved a recursively cycled fit between AMO and PDO. It switched fitting AMO for 2.5 minutes and then PDO for 2.5 minutes, cycling 50 times. This created a common forcing with an optimally shared fit, forcing baselined to PDO.

PDO

AMO

NINO34

IOD-East

IOD-West

Darwin

Tahiti

The table above shows the LTE modulation factors for Darwin and Tahiti model fits. The highlighted blocks show the phase of the modulation, which should have a difference of π radians for a perfect dipole and higher harmonics associated with it. (The K0 wavenumber = 0 has no phase, but just a sign). Of the modes that are shared 1, 45, 23, 36, 18, 39, 44, the average phase is 3.09, close to π (and K0 switches sign).

1.23-(-1.72) = 2.95 
1.47-(-2.05) = 3.52
-2.89-(0.166) = -3.056 
-0.367-(-2.58) = 2.213 
1.59-(-2.175) = 3.765 
0.27 - (-2.84) = 3.11 
-1.87 -1.14 = -3.01 

Average (2.95+3.52+3.056+2.213+3.765+3.11+3.01)/7 = 3.0891

Contrast to the IOD East/West dipole. Only the K0 (wavenumber=0) shows a reversal in sign. The LTE modulation terms are within 1 radian of each other, indicating much less of a dipole behavior on those terms. It’s possible that these sites don’t span a true dipole, either by its nature or from siting of the measurements.

Cross-validating a large interval span on PDO

using CC

using DTW metric, which pulls out more of the annual/semi-annual signal

adding a 3rd harmonic

Complement of the fitting interval, note the spectral composition maintains the same harmonics, indicating that the structure mapped to is stationary in the sense that the tidal pattern is not changing and the LTE modulation is largely fixed.

This is the resolved tidal forcing, finer than the annual impulse sampling used on the models above.

Below can see the primary 27.5545 lunar anomalistic cycle, mixed with the draconic 27.2122/13.606 cycle to create the 6/3 year modulation and the 206 day perigee-syzygy cycle (or 412 full cycle, as 206 includes antipodal full moon or new moon orientation).

(click on any image to magnify)