Wind Distribution

I prompted the Gemini LLM with the following to see how well it would summarize with so little context:

Familiar with the BesselK wind distribution model pioneered by Pukite et al in Mathematical GeoEnergy?

The BesselK Wind Model: A Maximum Entropy Derivation

In the study of wind energy, the Weibull distribution is the industry standard. However, as Pukite et al. argue in Mathematical GeoEnergy, real-world atmospheric data—especially from high-resolution sources like the Bonneville Power Administration (BPA)—often exhibits “fatter tails” (more frequent extreme gusts) than Weibull can account for.

The BesselK distribution (or K-distribution) emerges not as an empirical “best fit,” but as a physically grounded Maximum Entropy estimate.


1. The Starting Point: The Rayleigh Base

The Rayleigh distribution is the Maximum Entropy estimate for a 2D vector magnitude (wind velocity v) when only the average variance (\sigma^2) is known.

p(v | \sigma) = \frac{v}{\sigma^2} e^{-v^2 / 2\sigma^2}

In a simplified world, \sigma^2 would be constant. In the real atmosphere, turbulence and pressure gradients cause this variance to fluctuate over time.


2. The Fluctuating Environment

Pukite et al. treat the wind as a compound process. They assume that the local variance \sigma^2 is itself a random variable following a Gamma distribution (the MaxEnt choice for a scale parameter):

w(\sigma^2) = \frac{(\sigma^2)^{\nu-1} e^{-\sigma^2 / \theta}}{\theta^\nu \Gamma(\nu)}

Where:

  • \nu is the shape parameter (representing degrees of freedom in the turbulence).
  • \theta is the scale parameter.

3. The Compounding Integral

To find the final probability density of the wind velocity P(v), we must integrate the Rayleigh PDF over all possible states of the fluctuating variance:

P(v) = \int_{0}^{\infty} \left( \frac{v}{\sigma^2} e^{-v^2 / 2\sigma^2} \right) \cdot \left( \frac{(\sigma^2)^{\nu-1} e^{-\sigma^2 / \theta}}{\theta^\nu \Gamma(\nu)} \right) d\sigma^2


4. The Result: The BesselK (K-Distribution)

Solving this integral yields a PDF defined by the Modified Bessel Function of the Second Kind (K_{\nu-1}). This is the “BesselK” model:

f(v) = \frac{2}{b \Gamma(\nu)} \left( \frac{v}{2b} \right)^\nu K_{\nu-1} \left( \frac{v}{b} \right)

Comparison Table: Weibull vs. BesselK

FeatureWeibull ModelBesselK Model
PhilosophyEmpirical “Weakest Link” fitPhysical Maximum Entropy
Mathematical BaseExponential decayK_\nu (Bessel) function
High WindsUnderestimates gustsAccurately models “Fat Tails”
ApplicationAnnual Resource AssessmentStructural Fatigue & Grid Stability

Why This Matters for Wind Power

The BesselK model recognizes that wind is “doubly stochastic.” It isn’t just a random variable; it is a random variable driven by a fluctuating environment. For grid operators managing the intermittency of large-scale wind farms, this model provides a much more realistic picture of the “spikes” and “lulls” that define our renewable energy future.


Would you like me to generate a high-resolution chart of the BesselK vs. Weibull PDF curves that you can download and upload as a featured image for the post?

(my prompt) yes

Graph comparing wind speed probability density functions: BesselK vs. Weibull. Displays probability density along the y-axis and wind speed (m/s) on the x-axis. Includes peak frequency and tail behavior annotations, along with illustrative data.

(my prompt) logarithmic plot to show the tails

Logarithmic comparison of wind speed probability density for BesselK and Weibull distributions, showing BesselK with a 'fat tail' indicating higher extreme risk and Weibull with rapid decay underestimating extremes.

When we switch the y-axis to a logarithmic scale, the structural difference between the two models becomes undeniable. This “Log-PDF” view is what Pukite et al. emphasize when discussing the risk of extreme wind events.

Logarithmic Comparison: BesselK vs. Weibull

DistributionTail BehaviorMathematical DecayPhysical Implication
WeibullThin TailExponential-type decay (e^{-v^k})Underestimates the frequency of extreme “rogue” gusts.
BesselKFat TailPower-law-like decay (v^\nu K_{\nu-1})Accurately captures high-velocity “intermittency” and storm events.

Key Observations from the Log Plot:

  1. The “Divergence”: At standard operating speeds (5–15 m/s), both models track relatively closely. However, as wind speeds exceed 25 m/s (the typical cut-out speed for turbines), the Weibull curve drops off a “cliff,” while the BesselK curve remains significantly higher.
  2. Probability of Extremes: In a BesselK model, the probability of a 30 m/s or 40 m/s gust can be orders of magnitude higher than what a standard Weibull model predicts.
  3. Maximum Entropy Signature: The straight-line appearance (on a log scale) of the BesselK tail is a hallmark of its status as a Maximum Entropy distribution. It suggests that the atmosphere “wants” to distribute energy into these high-velocity states more than a simple Gaussian/Rayleigh process would allow.

This table helps summarize the parameters used in the visualization:

ParameterSymbolBesselK (Pukite)Weibull (Standard)
Shape\nu / k0.6 (High Volatility)2.0 (Rayleigh-like)
Scaleb / \lambda4.010.0
Mean Speed\bar{v}~8.9 m/s~8.9 m/s

Summary: “While the Weibull model is ‘safe’ for calculating average monthly revenue, the BesselK model is ‘necessary’ for calculating whether your turbine will survive a 50-year storm event. The log-plot reveals that extreme wind isn’t just an ‘outlier’—it is a mathematically predictable part of the atmospheric structure.”



The full discussion is here: https://gemini.google.com/app/4dfb7e86354dcb74

I’m surprised on how good a job it does in digesting the reasoning and derivation in the book, specifically Chapter 11
https://agupubs.onlinelibrary.wiley.com/doi/10.1002/9781119434351.ch11

I busted my butt in writing that chapter (and the rest), so am happy to see that it can actually be “understood” enough by an LLM to provide value for further research.

Controlled Experiments

Sorry to have to point this out, but it’s not my fault that geophysicists and climatologists can’t perform controlled experiments to test out various hypotheses. It’s not their fault either. It’s all nature’s decision to make gravitational forces so weak and planetary objects so massive to prevent anyone from scaling the effect to laboratory size to enable a carefully controlled experiment. One can always create roughly-equivalent emulations, such as a magnetic field experiment (described in the previous blog post) and validate a hypothesized behavior as a controlled lab experiment. Yet, I suspect that this would not get sufficient buy-in, as it’s not considered the actual real thing.

And that’s the dilemma. By the same token that analog emulators will not be trusted by geophysicists and climatologists, so too scientists from other disciplines will remain skeptical of untestable claims made by earth scientists. If nothing definitive comes out of a thought experiment that can’t be reproduced by others in a lab, they remain suspicious, as per their education and training.

It should therefore work both ways. As featured in the previous blog post, the model of the Chandler wobble forced by lunar torque needs to be treated fairly — either clearly debunked or considered as an alternative to the hazy consensus. ChatGPT remains open about the model, not the least bit swayed by colleagues or tribal bias. As the value of the Chandler wobble predicted by the lunar nodal model (432.7 days) is so close to the cited value of 433 days, as a bottom-line it should be difficult to ignore.

There are other indicators in the observational data to further substantiate this, see Chandler Wobble Forcing. It also makes sense in the context of the annual wobble.

As it stands, the lack of an experiment means a more equal footing for the alternatives, as they are all under equal amounts of suspicion.

Same goes for QBO. No controlled experiment is possible to test out the consensus QBO models, despite the fact that the Plumb and McEwan experiment is claimed to do just that. Sorry, but that experiment is not even close to the topology of a rotating sphere with a radial gravitational force operating on a gas. It also never predicted the QBO period. In contrast, the value of the QBO predicted by the lunar nodal model (28.4 months) is also too close to the cited value of 28 to 29 months to ignore. This also makes sense in the context of the semi-annual oscillation (SAO) located above the QBO .

Both the Chandler wobble and the QBO have the symmetry of a global wavenumber=0 phenomena so therefore only nodal cycles allowed — both for lunar and solar.

Next to ENSO. As with LOD modeling, this is not wavenumber=0 symmetry, as it must correspond to the longitude of a specific region. No controlled experiment is possible to test out the currently accepted models, premised as being triggered by wind shifts (an iffy cause vs. effect in any case). The mean value of the ENSO predicted by the tidal LOD-caibrated model (3.80 years modulated by 18.6 years) is too close to the cited value of 3.8 years with ~200 years of paleo and direct measurement to ignore.

Encyclopedia of Paleoclimatology and Ancient Environments, 721–728.
doi:10.1007/978-1-4020-4411-3_172 

In BLUE below is the LOD-calibrated tidal forcing, with linear amplification

In BLUE again below is a non-linear modulation of the tidal forcing according to the Laplace’s Tidal Equation solution, and trained on an early historical interval. This is something that a neural network should be able to do, as it excels at fitting to non-linear mappings that have a simple (i.e. low complexity) encoding — in this case it may be able to construct a Taylor series expansion of a sinusoidal modulating function.

The neural network’s ability to accurately represent a behavior is explained as a simplicity bias — a confounding aspect of machine learning tools such as ChatGPT and neural networks. The YouTube video below explains the counter-intuitive notion of how a NN with a deep set of possibilities tends to find the simplest solution and doing this without over-fitting the final mapping.

So that deep neural networks are claimed to have a built-in Occam’s Razor propensity, finding the most parsimonious input-output mappings when applied to training data. This is spot on with what I am doing with the LTE mapping, but bypassing the NN with a nonlinear sinusoidal modulation optimally fit on training data by a random search function.

I am tempted to try a NN on the ENSO training set as an experiment and see what it finds.


April 2, 2023

“I am tempted to try a NN on the ENSO training set as an experiment and see what it finds.”

Information Theory in Earth Science: Been there, done that

Following up from this post, there is a recent sequence of articles in an AGU journal on Water Resources Research under the heading: “Debates: Does Information Theory Provide a New Paradigm for Earth Science?”

By anticipating all these ideas, you can find plenty of examples and derivations (with many centered on the ideas of Maximum Entropy) in our book Mathematical Geoenergy.

Here is an excerpt from the “Emerging concepts” entry, which indirectly addresses negative entropy:

“While dynamical system theories have a long history in mathematics and physics and diverse applications to the hydrological sciences (e.g., Sangoyomi et al., 1996; Sivakumar, 2000; Rodriguez-Iturbe et al., 1989, 1991), their treatment of information has remained probabilistic akin to what is done in classical thermodynamics and statistics. In fact, the dynamical system theories treated entropy production as exponential uncertainty growth associated with stochastic perturbation of a deterministic system along unstable directions (where neighboring states grow exponentially apart), a notion linked to deterministic chaos. Therefore, while the kinematic geometry of a system was deemed deterministic, entropy (and information) remained inherently probabilistic. This led to the misconception that entropy could only exist in stochastically perturbed systems but not in deterministic systems without such perturbations, thereby violating the physical thermodynamic fact that entropy is being produced in nature irrespective of how we model it.

In that sense, classical dynamical system theories and their treatments of entropy and information were essentially the same as those in classical statistical mechanics. Therefore, the vast literature on dynamical systems, including applications to the Earth sciences, was never able to address information in ways going beyond the classical probabilistic paradigm.”

That is, there are likely many earth system behaviors that are highly ordered, but the complexity and non-linearity of their mechanisms makes them appear stochastic or chaotic (high positive entropy) yet the reality is that they are just a complicated deterministic model (negative entropy). We just aren’t looking hard enough to discover the underlying patterns on most of this stuff.

An excerpt from the Occam’s Razor entry, lifts from my cite of Gell-Mann

“Science and data compression have the same objective: discovery of patterns in (observed) data, in order to describe them in a compact form. In the case of science, we call this process of compression “explaining observed data.” The proposed or resulting compact form is often referred to as “hypothesis,” “theory,” or “law,” which can then be used to predict new observations. There is a strong parallel between the scientific method and the theory behind data compression. The field of algorithmic information theory (AIT) defines the complexity of data as its information content. This is formalized as the size (file length in bits) of its minimal description in the form of the shortest computer program that can produce the data. Although complexity can have many different meanings in different contexts (Gell-Mann, 1995), the AIT definition is particularly useful for quantifying parsimony of models and its role in science. “

Parsimony of models is a measure of negative entropy

Inverting non-autonomous functions

This is an algorithm based on minimum entropy (i.e. negative entropy) considerations which is essentially an offshoot of this paper Entropic Complexity Measured in Context Switching.

The objective is to apply negative entropy to find an optimal solution to a deterministically ordered pattern. To start, let us contrast the behavior of autonomous vs non-autonomous differential equations. One way to think about the distinction is that the transfer function for non-autonomous only depends on the presenting input. Thus, it acts like an op-amp with infinite bandwidth. Or below saturation it gives perfectly linear amplification, so that as shown on the graph to the right, the x-axis input produces an amplified y-axis output as long as the input is within reasonable limits.

Continue reading