Machine Learning and the Climate Sciences

https://platform.twitter.com/widgets.js

I’ve been applying equal doses of machine learning (and knowledge based artificial intelligence in general) and physics in my climate research since day one. Next month on December 12, I will be presenting Knowledge-Based Environmental Context Modeling at the AGU meeting which will cover these topics within the earth sciences realm :

Table 1: Technical approach to knowledge-based model building for the earth sciences

In my opinion, machine learning likely will eventually find all the patterns that appear in climate time-series but with various degrees of human assistance.

“Vipin Kumar, a computer scientist at the University of Minnesota in Minneapolis, has used machine learning to create algorithms for monitoring forest fires and assessing deforestation. When his team tasked a computer with learning to identify air-pressure patterns called teleconnections, such as the El Niño weather pattern, the algorithm found a previously unrecognized example over the Tasman Sea.”

In terms of the ENSO pattern, I believe that machine learning through tools such as Eureqa could have found the underlying lunisolar forcing pattern, but would have struggled mightily to break through the complexity barrier. In this case, the complexity barrier is in (1) discovering a biennial modulation which splits all the spectral components and (2) discovering the modifications to the lunar cycles from a strictly sinusoidal pattern.

The way that Eureqa would have found this pattern would be through it’s symbolic regression algorithm (which falls under the first row in Table 1 shown above). It essentially would start it’s machine learning search by testing various combinations of sines and cosines and capturing the most highly correlated combinations for further expansion.   As it expands the combinations, the algorithm would try to reduce complexity by applying trigonometric identities such as this

{displaystyle sin(alpha pm beta )=sin alpha cos beta pm cos alpha sin beta }

After a while, the algorithm will slow down under the weight of the combinatorial complexity of the search, and then the analyst would need to choose promising candidates from the complexity versus best-fit Pareto front. At this point one would need to apply knowledge of physical laws or mathematical heuristics which would lead to a potentially valid model.

So, in the case of the ENSO model, Eureqa could have discovered the (1) biennial modulation by reducing sets of trigonometric identities, and perhaps by applying a sin(A sin()) frequency modulation (which it is capable of) to discover the (2) second-order modifications to the sinusoidal functions, or (3) it could have been fed a differential equation structure to provide a hint to a solution  …. but, a human got there first by applying prior knowledge of signal processing and of the details in the orbital lunar cycles.

Yet as the Scientific America article suggests, that will likely not be the case in the future when the algorithms continue to improve and update their knowledge base with laws of physics.

This more sophisticated kind of reasoning involves the refined use of the other elements of Table 1.  For example, a more elaborate algorithm could have lifted an entire abstraction level out of a symbolic grouping and thus reduced its complexity. Or it could try to determine whether a behavior was stochastic or deterministic.  The next generation of these tools will be linked to knowledge-bases filled with physics patterns that are organized for searching and reasoning tasks. These will relate the problem under study to potential solutions automatically.

 

 

High Resolution ENSO Modeling

An intriguing discovery is that the higher-resolution aspects of the SOI time-series (as illustrated by the Australian BOM 30-day SOI moving average) may also have a tidal influence.  Note the fast noisy envelope that rides on top of the deep El Nino of 2015-2016 shown below:

For the standard monthly SOI as reported by NCAR and NOAA, this finer detail disappears.  BOM provides the daily SOI value for about the past ~ 3 years here.

Yet if we retain this in the 1880-present monthly ENSO model, by simultaneously isolating [1] the higher frequency fine structure from 2015-2017, the fine structure also emerges in the model. This is shown in the lower panel below.

This indicates that the differential equation being used currently can possibly be modified to include faster-responding derivative terms which will simultaneously show the multi-year fluctuations as well as what was thought to be a weekly-to-monthly-scale noise envelope. In fact, I had been convinced that this term was due to localized weather but a recent post suggested that this may indeed be a deterministic signal.

Lunisolar tidal effects likely do impact the ocean behavior at every known time-scale, from the well-characterized diurnal and semi-diurnal SLH tides to the long-term deep-ocean mixing proposed by Munk and Wunsch.  It’s not surprising that tidal forces would have an impact on the intermediate time-scale ENSO dynamics, both at the conventional low resolution (used for El Nino predictions) and at the higher-resolution that emerges from SOI measurements (the 30-day moving average shown above).  Obviously, monthly and fortnightly oscillations observed in the SOI are commensurate with the standard lunar tides of periods 13-14 days and 27-28 days. And non-linear interactions may result in the 40-60 day oscillations observed in LOD.

from Earth Rotational Variations Excited by Geophysical Fluids, B.F. Chao, http://ivs.nict.go.jp/mirror/publications/gm2004/chao/

It’s entirely possible that removing the 30-day moving average on the SOI measurements can reveal even more detail/

Footnote

[1] Isolation is accomplished by subtracting a 24-day average about the moving average value, which suppresses the longer-term SOI variation.

 

The ENSO Forcing Potential – Cheaper, Faster, and Better

Following up on the last post on the ENSO forcing, this note elaborates on the math.  The tidal gravitational forcing function used follows an inverse power-law dependence, where a(t) is the anomalistic lunar distance and d(t) is the draconic or nodal perturbation to the distance.

F(t) propto frac{1}{(R_0 + a(t) + d(t))^2}'

Note the prime indicating that the forcing applied is the derivative of the conventional inverse squared Newtonian attraction. This generates an inverse cubic formulation corresponding to the consensus analysis describing a differential tidal force:

F(t) propto -frac{a'(t)+d'(t)}{(R_0 + a(t) + d(t))^3}

For a combination of monthly and fortnightly sinusoidal terms for a(t) and d(t) (suitably modified for nonlinear nodal and perigean corrections due to the synodic/tropical cycle)   the search routine rapidly converges to an optimal ENSO fit.  It does this more quickly than the harmonic analysis, which requires at least double the unknowns for the additional higher-order factors needed to capture the tidally forced response waveform. One of the keys is to collect the chain rule terms a'(t) and d'(t) in the numerator; without these, the necessary mixed terms which multiply the anomalistic and draconic signals do not emerge strongly.

As before, a strictly biennial modulation needs to be applied to this forcing to capture the measured ENSO dynamics — this is a period-doubling pattern observed in hydrodynamic systems with a strong fundamental (in this case annual) and is climatologically explained by a persistent year-to-year regenerative feedback in the SLP and SST anomalies.

Here is the model fit for training from 1880-1980, with the extrapolated test region post-1980 showing a good correlation.

The geophysics is now canonically formulated, providing (1) a simpler and more concise expression, leading to (2) a more efficient computational solution, (3) less possibility of over-fitting, and (4) ultimately generating a much better correlation. Alternatively, stated in modeling terms, the resultant information metric is improved by reducing the complexity and improving the correlation — the vaunted  cheaper, faster, and better solution. Or, in other words: get the physics right, and all else follows.

 

 

 

 

 

 

 

 

 

 

 

 

 

Is the SOI noisy or is it signal?

Applying the analytical solution to Laplace’s tidal equations, we can isolate the parts of the Southern Oscillation Index signal that appear quite noisy (i.e. 1880-1885, 1900-1905, etc).

https://platform.twitter.com/widgets.js

For this 3-month averaged SOI fit, it’s a sin(sin(f(t))) function in the ENSO model that generates the folded signal which appears as a rapidly fluctuating and noisy signal. Although my simplification of Laplace’s equation was originally applied to QBO, it is applicable to other equatorial standing wave phenomenon such as ENSO, of which the SOI is a measure. The SOI signal has always been considered noisy — especially in contrast to other ENSO measures such as NINO34 — but perhaps this needs to be rethought, as the higher frequency components may be real signal.

These results will be presented at next month’s AGU meeting:

https://platform.twitter.com/widgets.js

Approximating the ENSO Forcing Potential

From the last post, we tried to estimate the lunar tidal forcing potential from the fitted harmonics of the ENSO model. Two observations resulted from that exercise: (1) the possibility of over-fitting to the expanded Taylor series, and (2) the potential of fitting to the ENSO data directly from the inverse power law.

The Taylor’s series of the forcing potential is a power-law polynomial corresponding to the lunar harmonic terms. The chief characteristic of the polynomial is the alternating sign for each successive power (see here), which has implications for convergence under certain regimes. What happens with the alternating sign is that each of the added harmonics will highly compensate the previous underlying harmonics, giving the impression that pulling one signal out will scramble the fit. This is conceptually no different than eliminating any one term from a sine or cosine Taylor’s series, which are also all compensating with alternating sign.

The specific conditions that we need to be concerned with respect to series convergence is when r (perturbations to the lunar orbit) is a substantial fraction of R (distance from earth to moon) :

F(r) = frac{1}{(R+r)^3}

Because we need to keep those terms for high precision modeling, we also need to be wary of possible over-fitting of these terms — as the solver does not realize that the values for those terms have the constraint that they derive from the original Taylor’s series. It’s not really a problem for conventional tidal analysis, as the signals are so clean, but for the noisy ENSO time-series, this is an issue.

Of course the solution to this predicament is not to do the Taylor series harmonic fitting at all, but leave it in the form of the inverse power law. That makes a lot of sense — and the only reason for not doing this until now is probably due to the inertia of conventional wisdom, in that it wasn’t necessary for tidal analysis where harmonics work adequately.

So this alternate and more fundamental formulation is what we show here.

Continue reading