Information Theory in Earth Science: Been there, done that

Following up from this post, there is a recent sequence of articles in an AGU journal on Water Resources Research under the heading: “Debates: Does Information Theory Provide a New Paradigm for Earth Science?”

By anticipating all these ideas, you can find plenty of examples and derivations (with many centered on the ideas of Maximum Entropy) in our book Mathematical Geoenergy.

Here is an excerpt from the “Emerging concepts” entry, which indirectly addresses negative entropy:

“While dynamical system theories have a long history in mathematics and physics and diverse applications to the hydrological sciences (e.g., Sangoyomi et al., 1996; Sivakumar, 2000; Rodriguez-Iturbe et al., 1989, 1991), their treatment of information has remained probabilistic akin to what is done in classical thermodynamics and statistics. In fact, the dynamical system theories treated entropy production as exponential uncertainty growth associated with stochastic perturbation of a deterministic system along unstable directions (where neighboring states grow exponentially apart), a notion linked to deterministic chaos. Therefore, while the kinematic geometry of a system was deemed deterministic, entropy (and information) remained inherently probabilistic. This led to the misconception that entropy could only exist in stochastically perturbed systems but not in deterministic systems without such perturbations, thereby violating the physical thermodynamic fact that entropy is being produced in nature irrespective of how we model it.

In that sense, classical dynamical system theories and their treatments of entropy and information were essentially the same as those in classical statistical mechanics. Therefore, the vast literature on dynamical systems, including applications to the Earth sciences, was never able to address information in ways going beyond the classical probabilistic paradigm.”

That is, there are likely many earth system behaviors that are highly ordered, but the complexity and non-linearity of their mechanisms makes them appear stochastic or chaotic (high positive entropy) yet the reality is that they are just a complicated deterministic model (negative entropy). We just aren’t looking hard enough to discover the underlying patterns on most of this stuff.

An excerpt from the Occam’s Razor entry, lifts from my cite of Gell-Mann

“Science and data compression have the same objective: discovery of patterns in (observed) data, in order to describe them in a compact form. In the case of science, we call this process of compression “explaining observed data.” The proposed or resulting compact form is often referred to as “hypothesis,” “theory,” or “law,” which can then be used to predict new observations. There is a strong parallel between the scientific method and the theory behind data compression. The field of algorithmic information theory (AIT) defines the complexity of data as its information content. This is formalized as the size (file length in bits) of its minimal description in the form of the shortest computer program that can produce the data. Although complexity can have many different meanings in different contexts (Gell-Mann, 1995), the AIT definition is particularly useful for quantifying parsimony of models and its role in science. “

Parsimony of models is a measure of negative entropy

Odd cycles in Length-of-Day (LOD) variations

Two papers on the analysis of >1 year periods in the LOD time series measured since 1962.

The consistency of interdecadal changes in the Earth’s rotation variations

On the ~ 7 year periodic signal in length of day from a frequency domain stepwise regression method

These cycles may be related to aliased tidal periods with the annual cycle, as in modeling ENSO.


A paper describing new satellite measurements for precision LOD measurements.

BeiDou satellite radiation force models for precise orbit
determination and geodetic applications
” from TechRxiv

Note the detail on the 13.6 day fortnightly tidal period

Nonlinear long-period tidal forcing with application to ENSO, QBO, and Chandler wobble

Model fitting process for ENSO

Back to EGU abstract and presentation


Addendum: After this presentation was submitted, a ground-breaking paper by a group at the University of Paris came on-line. Their paper, “On the Shoulders of Laplace” covers much the same ground as the EGU presentation linked above.

Their main thesis is that Pierre-Simon Laplace in 1799 correctly theorized that the wobble in the Earth’s rotation is due to the moon and sun, described in the treatise “Traité de Mécanique Céleste (Treatise of Celestial Mechanics)“.


Excerpts from the paper “On the shoulders of Laplace”

Moreover Lopes et al claim that this celestial gravitational forcing carries over to controlling cyclic climate indices, following Laplace’s mathematical formulation (now known as Laplace’s Tidal Equations) for describing oceanic tides.

Excerpt from the paper “On the shoulders of Laplace”

This view also aligns with the way we model climate indices such as ENSO and QBO via a solution to Laplace’s Tidal Equations, as described in the linked EGU presentation above.


ESD Ideas article for review

Get a Copernicus login and comment for peer-review

The simple idea is that tidal forces play a bigger role in geophysical behaviors than previously thought, and thus helping to explain phenomena that have frustrated scientists for decades.

The idea is simple but the non-linear math (see figure above for ENSO) requires cracking to discover the underlying patterns.

The rationale for the ESD Ideas section in the EGU Earth System Dynamics journal is to get discussion going on innovative and novel ideas. So even though this model is worked out comprehensively in Mathematical Geoenergy, it hasn’t gotten much publicity.

Gravitational Pull

In Chapter 12 of the book, we provide an empirical gravitational forcing term that can be applied to the Laplace’s Tidal Equation (LTE) solution for modeling ENSO. The inverse squared law is modified to a cubic law to take into account the differential pull from opposite sides of the earth.

excerpt from Mathematical Geoenergy (Wiley/2018)

The two main terms are the monthly anomalistic (Mm) cycle and the fortnightly tropical/draconic pair (Mf, Mf’ w/ a 18.6 year nodal modulation). Due to the inverse cube gravitational pull found in the denominator of F(t), faster harmonic periods are also created — with the 9-day (Mt) created from the monthly/fortnightly cross-term and the weekly (Mq) from the fortnightly crossed against itself. It’s amazing how few terms are needed to create a canonical fit to a tidally-forced ENSO model.

The recipe for the model is shown in the chart below (click to magnify), following sequentially steps (A) through (G) :

(A) Long-period fortnightly and anomalistic tidal terms as F(t) forcing
(B) The Fourier spectrum of F(t) revealing higher frequency cross terms
(C) An annual impulse modulates the forcing, reinforcing the amplitude
(D) The impulse is integrated producing a lagged quasi-periodic input
(E) Resulting Fourier spectrum is complex due to annual cycle aliasing
(F) Oceanic response is a Laplace’s Tidal Equation (LTE) modulation
(G) Final step is fit the LTE modulation to match the ENSO time-series

The tidal forcing is constrained by the known effects of the lunisolar gravitational torque on the earth’s length-of-day (LOD) variations. An essentially identical set of monthly, fortnightly, 9-day, and weekly terms are required for both a solid-body LOD model fit and a fluid-volume ENSO model fit.

Fitting tidal terms to the dLOD/dt data is only complicated by the aliasing of the annual cycle, making factors such as the weekly 7.095 and 6.83-day cycles difficult to distinguish.

If we apply the same tidal terms as forcing for matching dLOD data, we can use the fit below as a perturbed ENSO tidal forcing. Not a lot of difference here — the weekly harmonics are higher in magnitude.

Modified initial calibration of lunar terms for fitting ENSO

So the only real unknown in this process is guessing the LTE modulation of steps (F) and (G). That’s what differentiates the inertial response of a spinning solid such as the earth’s core and mantle from the response of a rotating liquid volume such as the equatorial Pacific ocean. The former is essentially linear, but the latter is non-linear, making it an infinitely harder problem to solve — as there are infinitely many non-linear transformations one can choose to apply. The only reason that I stumbled across this particular LTE modulation is that it comes directly from a clever solution of Laplace’s tidal equations.

for full derivation see Mathematical Geoenergy (Wiley/2018)

Mathematical Geoenergy

Our book Mathematical Geoenergy presents a number of novel approaches that each deserve a research paper on their own. Here is the list, ordered roughly by importance (IMHO):

  1. Laplace’s Tidal Equation Analytic Solution.
    (Ch 11, 12) A solution of a Navier-Stokes variant along the equator. Laplace’s Tidal Equations are a simplified version of Navier-Stokes and the equatorial topology allows an exact closed-form analytic solution. This could classify for the Clay Institute Millenium Prize if the practical implications are considered, but it’s a lower-dimensional solution than a complete 3-D Navier-Stokes formulation requires.
  2. Model of El Nino/Southern Oscillation (ENSO).
    (Ch 12) A tidally forced model of the equatorial Pacific’s thermocline sloshing (the ENSO dipole) which assumes a strong annual interaction. Not surprisingly this uses the Laplace’s Tidal Equation solution described above, otherwise the tidal pattern connection would have been discovered long ago.
  3. Model of Quasi-Biennial Oscillation (QBO).
    (Ch 11) A model of the equatorial stratospheric winds which cycle by reversing direction ~28 months. This incorporates the idea of amplified cycling of the sun and moon nodal declination pattern on the atmosphere’s tidal response.
  4. Origin of the Chandler Wobble.
    (Ch 13) An explanation for the ~433 day cycle of the Earth’s Chandler wobble. Finding this is a fairly obvious consequence of modeling the QBO.
  5. The Oil Shock Model.
    (Ch 5) A data flow model of oil extraction and production which allows for perturbations. We are seeing this in action with the recession caused by oil supply perturbations due to the Corona Virus pandemic.
  6. The Dispersive Discovery Model.
    (Ch 4) A probabilistic model of resource discovery which accounts for technological advancement and a finite search volume.
  7. Ornstein-Uhlenbeck Diffusion Model
    (Ch 6) Applying Ornstein-Uhlenbeck diffusion to describe the decline and asymptotic limiting flow from volumes such as occur in fracked shale oil reservoirs.
  8. The Reservoir Size Dispersive Aggregation Model.
    (Ch 4) A first-principles model that explains and describes the size distribution of oil reservoirs and fields around the world.
  9. Origin of Tropical Instability Waves (TIW).
    (Ch 12) As the ENSO model was developed, a higher harmonic component was found which matches TIW
  10. Characterization of Battery Charging and Discharging.
    (Ch 18) Simplified expressions for modeling Li-ion battery charging and discharging profiles by applying dispersion on the diffusion equation, which reflects the disorder within the ion matrix.
  11. Anomalous Behavior in Dispersive Transport explained.
    (Ch 18) Photovoltaic (PV) material made from disordered and amorphous semiconductor material shows poor photoresponse characteristics. Solution to simple entropic dispersion relations or the more general Fokker-Planck leads to good agreement with the data over orders of magnitude in current and response times.
  12. Framework for understanding Breakthrough Curves and Solute Transport in Porous Materials.
    (Ch 20) The same disordered Fokker-Planck construction explains the dispersive transport of solute in groundwater or liquids flowing in porous materials.
  13. Wind Energy Analysis.
    (Ch 11) Universality of wind energy probability distribution by applying maximum entropy to the mean energy observed. Data from Canada and Germany. Found a universal BesselK distribution which improves on the conventional Rayleigh distribution.
  14. Terrain Slope Distribution Analysis.
    (Ch 16) Explanation and derivation of the topographic slope distribution across the USA. This uses mean energy and maximum entropy principle.
  15. Thermal Entropic Dispersion Analysis.
    (Ch 14) Solving the Fokker-Planck equation or Fourier’s Law for thermal diffusion in a disordered environment. A subtle effect but the result is a simplified expression not involving complex errf transcendental functions. Useful in ocean heat content (OHC) studies.
  16. The Maximum Entropy Principle and the Entropic Dispersion Framework.
    (Ch 10) The generalized math framework applied to many models of disorder, natural or man-made. Explains the origin of the entroplet.
  17. Solving the Reserve Growth “enigma”.
    (Ch 6) An application of dispersive discovery on a localized level which models the hyperbolic reserve growth characteristics observed.
  18. Shocklets.
    (Ch 7) A kernel approach to characterizing production from individual oil fields.
  19. Reserve Growth, Creaming Curve, and Size Distribution Linearization.
    (Ch 6) An obvious linearization of this family of curves, related to Hubbert Linearization but more useful since it stems from first principles.
  20. The Hubbert Peak Logistic Curve explained.
    (Ch 7) The Logistic curve is trivially explained by dispersive discovery with exponential technology advancement.
  21. Laplace Transform Analysis of Dispersive Discovery.
    (Ch 7) Dispersion curves are solved by looking up the Laplace transform of the spatial uncertainty profile.
  22. Gompertz Decline Model.
    (Ch 7) Exponentially increasing extraction rates lead to steep production decline.
  23. The Dynamics of Atmospheric CO2 buildup and Extrapolation.
    (Ch 9) Convolving a fat-tailed CO2 residence time impulse response function with a fossil-fuel emissions stimulus. This shows the long latency of CO2 buildup very straightforwardly.
  24. Reliability Analysis and Understanding the “Bathtub Curve”.
    (Ch 19) Using a dispersion in failure rates to generate the characteristic bathtub curves of failure occurrences in parts and components.
  25. The Overshoot Point (TOP) and the Oil Production Plateau.
    (Ch 8) How increases in extraction rate can maintain production levels.
  26. Lake Size Distribution.
    (Ch 15) Analogous to explaining reservoir size distribution, uses similar arguments to derive the distribution of freshwater lake sizes. This provides a good feel for how often super-giant reservoirs and Great Lakes occur (by comparison).
  27. The Quandary of Infinite Reserves due to Fat-Tail Statistics.
    (Ch 9) Demonstrated that even infinite reserves can lead to limited resource production in the face of maximum extraction constraints.
  28. Oil Recovery Factor Model.
    (Ch 6) A model of oil recovery which takes into account reservoir size.
  29. Network Transit Time Statistics.
    (Ch 21) Dispersion in TCP/IP transport rates leads to the measured fat-tails in round-trip time statistics on loaded networks.
  30. Particle and Crystal Growth Statistics.
    (Ch 20) Detailed model of ice crystal size distribution in high-altitude cirrus clouds.
  31. Rainfall Amount Dispersion.
    (Ch 15) Explanation of rainfall variation based on dispersion in rate of cloud build-up along with dispersion in critical size.
  32. Earthquake Magnitude Distribution.
    (Ch 13) Distribution of earthquake magnitudes based on dispersion of energy buildup and critical threshold.
  33. IceBox Earth Setpoint Calculation.
    (Ch 17) Simple model for determining the earth’s setpoint temperature extremes — current and low-CO2 icebox earth.
  34. Global Temperature Multiple Linear Regression Model
    (Ch 17) The global surface temperature records show variability that is largely due to the GHG rise along with fluctuating changes due to ocean dipoles such as ENSO (via the SOI measure and also AAM) and sporadic volcanic eruptions impacting the atmospheric aerosol concentrations.
  35. GPS Acquisition Time Analysis.
    (Ch 21) Engineering analysis of GPS cold-start acquisition times. Using Maximum Entropy in EMI clutter statistics.
  36. 1/f Noise Model
    (Ch 21) Deriving a random noise spectrum from maximum entropy statistics.
  37. Stochastic Aquatic Waves
    (Ch 12) Maximum Entropy Analysis of wave height distribution of surface gravity waves.
  38. The Stochastic Model of Popcorn Popping.
    (Appx C) The novel explanation of why popcorn popping follows the same bell-shaped curve of the Hubbert Peak in oil production. Can use this to model epidemics, etc.
  39. Dispersion Analysis of Human Transportation Statistics.
    (Appx C) Alternate take on the empirical distribution of travel times between geographical points. This uses a maximum entropy approximation to the mean speed and mean distance across all the data points.

 

Asymptotic QBO Period

The modeled QBO cycle is directly related to the nodal (draconian) lunar cycle physically aliased against the annual cycle.  The empirical cycle period is best estimated by tracking the peak acceleration of the QBO velocity time-series, as this acceleration (1st derivative of the velocity) shows a sharp peak. This value should asymptotically approach a 2.368 year period over the long term.  Since the recent data from the main QBO repository provides an additional acceleration peak from the past month, now is as good a time as any to analyze the cumulative data.



The new data-point provides a longer period which compensated for some recent shorter periods, such that the cumulative mean lies right on the asymptotic line. The jitter observed is explainable in terms of the model, as acceleration peaks are more prone to align close to an annual impulse. But the accumulated mean period is still aligned to the draconic aliasing with this annual impulse. As more data points come in over the coming decades, the mean should vary less and less from the asymptotic value.

The fit to QBO using all the data save for the last available data point is shown below.  Extrapolating beyond the green arrow, we should see an uptick according to the red waveform.



Adding the recent data-point and the blue waveform does follow the model.



There was a flurry of recent discussion on the QBO anomaly of 2016 (shown as a split peak above), which implied that perhaps the QBO would be permanently disrupted from it’s long-standing pattern. Instead, it may be a more plausible explanation that the QBO pattern was not simply wandering from it’s assumed perfectly cyclic path but instead is following a predictable but jittery track that is a combination of the (physically-aliased) annual impulse-synchronized Draconic cycle together with a sensitivity to variations in the draconic cycle itself. The latter calibration is shown below, based on NASA ephermeris.



This is the QBO spectral decomposition, showing signal strength centered on the fundamental aliased Draconic value, both for the data and the set by the model.


The main scientist, Prof. Richard Lindzen, behind the consensus QBO model has been recently introduced here as being “considered the most distinguished living climate scientist on the planet”.  In his presentation criticizing AGW science [1], Lindzen claimed that the climate oscillates due to a steady uniform force, much like a violin oscillates when the steady force of a bow is drawn across its strings.  An analogy perhaps better suited to reality is that the violin is being played like a drum. Resonance is more of a decoration to the beat itself.
Keith 🌛 ?

[1] Professor Richard Lindzen slammed conventional global warming thinking warming as ‘nonsense’ in a lecture for the Global Warming Policy Foundation on Monday. ‘An implausible conjecture backed by false evidence and repeated incessantly … is used to promote the overturn of industrial civilization,’ he said in London. — GWPF

NAO

The challenge of validating the models of climate oscillations such as ENSO and QBO, rests primarily in our inability to perform controlled experiments. Because of this shortcoming, we can either do (1) predictions of future behavior and validate via the wait-and-see process, or (2) creatively apply techniques such as cross-validation on currently available data. The first is a non-starter because it’s obviously pointless to wait decades for validation results to confirm a model, when it’s entirely possible to do something today via the second approach.

There are a variety of ways to perform model cross-validation on measured data.

In its original and conventional formulation, cross-validation works by checking one interval of time-series against another, typically by training on one interval and then validating on an orthogonal interval.

Another way to cross-validate is to compare two sets of time-series data collected on behaviors that are potentially related. For example, in the case of ocean tidal data that can be collected and compared across spatially separated geographic regions, the sea-level-height (SLH) time-series data will not necessarily be correlated, but the underlying lunar and solar forcing factors will be closely aligned give or take a phase factor. This is intuitively understandable since the two locations share a common-mode signal forcing due to the gravitational pull of the moon and sun, with the differences in response due to the geographic location and local spatial topology and boundary conditions. For tides, this is a consensus understanding and tidal prediction algorithms have stood the test of time.

In the previous post, cross-validation on distinct data sets was evaluated assuming common-mode lunisolar forcing. One cross-validation was done between the ENSO time-series and the AMO time-series. Another cross-validation was performed for ENSO against PDO. The underlying common-mode lunisolar forcings were highly correlated as shown in the featured figure.  The LTE spatial wave-number weightings were the primary discriminator for the model fit. This model is described in detail in the book Mathematical GeoEnergy to be published at the end of the year by Wiley.

Another common-mode cross-validation possible is between ENSO and QBO, but in this case it is primarily in the Draconic nodal lunar factor — the cyclic forcing that appears to govern the regular oscillations of QBO.  Below is the Draconic constituent comparison for QBO and the ENSO.

The QBO and ENSO models only show a common-mode correlated response with respect to the Draconic forcing. The Draconic forcing drives the quasi-periodicity of the QBO cycles, as can be seen in the lower right panel, with a small training window.

This cross-correlation technique can be extended to what appears to be an extremely erratic measure, the North Atlantic Oscillation (NAO).

Like the SOI measure for ENSO, the NAO is originally derived from a pressure dipole measured at two separate locations — but in this case north of the equator.  From the high-frequency of the oscillations, a good assumption is that the spatial wavenumber factors are much higher than is required to fit ENSO. And that was the case as evidenced by the figure below.

ENSO vs NAO cross-validation

Both SOI and NAO are noisy time-series with the NAO appearing very noisy, yet the lunisolar constituent forcings are highly synchronized as shown by correlations in the lower pane. In particular, summing the Anomalistic and Solar constituent factors together improves the correlation markedly, which is because each of those has influence on the other via the lunar-solar mutual gravitational attraction. The iterative fitting process adjusts each of the factors independently, yet the net result compensates the counteracting amplitudes so the net common-mode factor is essentially the same for ENSO and NAO (see lower-right correlation labelled Anomalistic+Solar).

Since the NAO has high-frequency components, we can also perform a conventional cross-validation across orthogonal intervals. The validation interval below is for the years between 1960 and 1990, and even though the training intervals were aggressively over-fit, the correlation between the model and data is still visible in those 30 years.

NAO model fit with validation spanning 1960 to 1990

Over the course of time spent modeling ENSO, the effort that went into fitting to NAO was a fraction of the original time. This is largely due to the fact that the temporal lunisolar forcing only needed to be tweaked to match other climate indices, and the iteration over the topological spatial factors quickly converges.

Many more cross-validation techniques are available for NAO, since there are different flavors of NAO indices available corresponding to different Atlantic locations, and spanning back to the 1800’s.

ENSO, AMO, PDO and common-mode mechanisms

The basis of the ENSO model is the forcing derived from the long-period cyclic lunisolar gravitational pull of the moon and sun. There is some thought that ENSO shows teleconnections to other oceanic behaviors. The primary oceanic dipoles are ENSO and AMO for the Pacific and Atlantic. There is also the PDO for the mid-northern-latitude of the Pacific, which has a pattern distinct from ENSO. So the question is: Are these connected through interactions or do they possibly share a common-mode mechanism through the same lunisolar forcing mechanism?

Based on tidal behaviors, it is known that the gravitational pull varies geographically, so it would be understandable that ENSO, AMO, and PDO would demonstrate distinct time-series signatures. In checking this, you will find that the correlation coefficient between any two of these series is essentially zero, regardless of applied leads or lags. Yet the underlying component factors (the lunar Draconic, lunar Anomalistic, and solar modified terms) may potentially emerge with only slight variations in shape, with differences only in relative amplitude. This is straightforward to test by fitting the basic ENSO model to AMO and PDO by allowing the parameters to vary.

The following figure is the result of fitting the model to ENSO, AMO, and PDO and then comparing the constituent factors.

First, note that the same parametric model fits each of the time series arguably well. The Draconic factor underling both the ENSO and AMO model is almost perfectly aligned, indicated by the red starred graph, with excursions showing a CC above 0.99. All of the rest of the CC’s in fact are above 0.6.

The upshot of this analysis is two-fold. First to consider how difficult it is to fit any one of these time series to a minimal set of periodically-forced signals. Secondly that the underlying signals are not that different in character, only that the combination in terms of a Laplace’s tidal equation weighting are what couples them together via a common-mode mechanism. Thus, the teleconnection between these oceanic indices is likely an underlying common lunisolar tidal forcing, just as one would suspect from conventional tidal analysis.

ENSO model verification via Fourier analysis infill

Because the ENSO model generates precise temporal harmonics via a non-linear solution to Laplace’s Tidal Equations, it may in practice be trivially easy to verify. By only using higher-frequency harmonics (T<1.25y) during spectral training (with a small window of low-frequency signal to stabilize the solution, T>11y), the model essentially fills in the missing bulk of the signal frequency spectrum, 1.25y < T < 11y.  This is shown below in Figure 1.

Fig. 1: Bottom panel of amplitude ENSO SOI spectra shows the training windows.  A primarily low-amplitude spectral signal is used to fit the model (using least-squares on the error signal). Upper spectra shows the expanded view of the out-of-band fit. This rich spectra is all due to the non-linear harmonic solution of the ENSO Laplace’s Tidal Equation solution.

This agreement is statistically unlikely (nee impossible) to occur unless the out-of-band signal had knowledge of the fundamental harmonics (i.e. the highest amplitude terms in the meat of the spectra) that are contributing to the higher harmonics.

Figure 2 is the underlying temporal fit. Although not as good a fit as what we can achieve using more of the primary Fourier terms, it is still striking.

Fig. 2: Temporal model fit using only Fourier frequency terms shorter than 1.25 years and longer than 11 years. The correlation coefficient is 0.7 here

The consensus claim is that ENSO is a chaotic process with no long-term coherence. Yet, this shows excellent agreement with a forced lunisolar model showing very long-term coherence.   An issue to raise is: why has the obvious deterministic forcing model been abandoned as a plausible physical mechanism so long ago?