Wind Distribution

I prompted the Gemini LLM with the following to see how well it would summarize with so little context:

Familiar with the BesselK wind distribution model pioneered by Pukite et al in Mathematical GeoEnergy?

The BesselK Wind Model: A Maximum Entropy Derivation

In the study of wind energy, the Weibull distribution is the industry standard. However, as Pukite et al. argue in Mathematical GeoEnergy, real-world atmospheric data—especially from high-resolution sources like the Bonneville Power Administration (BPA)—often exhibits “fatter tails” (more frequent extreme gusts) than Weibull can account for.

The BesselK distribution (or K-distribution) emerges not as an empirical “best fit,” but as a physically grounded Maximum Entropy estimate.


1. The Starting Point: The Rayleigh Base

The Rayleigh distribution is the Maximum Entropy estimate for a 2D vector magnitude (wind velocity v) when only the average variance (\sigma^2) is known.

p(v | \sigma) = \frac{v}{\sigma^2} e^{-v^2 / 2\sigma^2}

In a simplified world, \sigma^2 would be constant. In the real atmosphere, turbulence and pressure gradients cause this variance to fluctuate over time.


2. The Fluctuating Environment

Pukite et al. treat the wind as a compound process. They assume that the local variance \sigma^2 is itself a random variable following a Gamma distribution (the MaxEnt choice for a scale parameter):

w(\sigma^2) = \frac{(\sigma^2)^{\nu-1} e^{-\sigma^2 / \theta}}{\theta^\nu \Gamma(\nu)}

Where:

  • \nu is the shape parameter (representing degrees of freedom in the turbulence).
  • \theta is the scale parameter.

3. The Compounding Integral

To find the final probability density of the wind velocity P(v), we must integrate the Rayleigh PDF over all possible states of the fluctuating variance:

P(v) = \int_{0}^{\infty} \left( \frac{v}{\sigma^2} e^{-v^2 / 2\sigma^2} \right) \cdot \left( \frac{(\sigma^2)^{\nu-1} e^{-\sigma^2 / \theta}}{\theta^\nu \Gamma(\nu)} \right) d\sigma^2


4. The Result: The BesselK (K-Distribution)

Solving this integral yields a PDF defined by the Modified Bessel Function of the Second Kind (K_{\nu-1}). This is the “BesselK” model:

f(v) = \frac{2}{b \Gamma(\nu)} \left( \frac{v}{2b} \right)^\nu K_{\nu-1} \left( \frac{v}{b} \right)

Comparison Table: Weibull vs. BesselK

FeatureWeibull ModelBesselK Model
PhilosophyEmpirical “Weakest Link” fitPhysical Maximum Entropy
Mathematical BaseExponential decayK_\nu (Bessel) function
High WindsUnderestimates gustsAccurately models “Fat Tails”
ApplicationAnnual Resource AssessmentStructural Fatigue & Grid Stability

Why This Matters for Wind Power

The BesselK model recognizes that wind is “doubly stochastic.” It isn’t just a random variable; it is a random variable driven by a fluctuating environment. For grid operators managing the intermittency of large-scale wind farms, this model provides a much more realistic picture of the “spikes” and “lulls” that define our renewable energy future.


Would you like me to generate a high-resolution chart of the BesselK vs. Weibull PDF curves that you can download and upload as a featured image for the post?

(my prompt) yes

Graph comparing wind speed probability density functions: BesselK vs. Weibull. Displays probability density along the y-axis and wind speed (m/s) on the x-axis. Includes peak frequency and tail behavior annotations, along with illustrative data.

(my prompt) logarithmic plot to show the tails

Logarithmic comparison of wind speed probability density for BesselK and Weibull distributions, showing BesselK with a 'fat tail' indicating higher extreme risk and Weibull with rapid decay underestimating extremes.

When we switch the y-axis to a logarithmic scale, the structural difference between the two models becomes undeniable. This “Log-PDF” view is what Pukite et al. emphasize when discussing the risk of extreme wind events.

Logarithmic Comparison: BesselK vs. Weibull

DistributionTail BehaviorMathematical DecayPhysical Implication
WeibullThin TailExponential-type decay (e^{-v^k})Underestimates the frequency of extreme “rogue” gusts.
BesselKFat TailPower-law-like decay (v^\nu K_{\nu-1})Accurately captures high-velocity “intermittency” and storm events.

Key Observations from the Log Plot:

  1. The “Divergence”: At standard operating speeds (5–15 m/s), both models track relatively closely. However, as wind speeds exceed 25 m/s (the typical cut-out speed for turbines), the Weibull curve drops off a “cliff,” while the BesselK curve remains significantly higher.
  2. Probability of Extremes: In a BesselK model, the probability of a 30 m/s or 40 m/s gust can be orders of magnitude higher than what a standard Weibull model predicts.
  3. Maximum Entropy Signature: The straight-line appearance (on a log scale) of the BesselK tail is a hallmark of its status as a Maximum Entropy distribution. It suggests that the atmosphere “wants” to distribute energy into these high-velocity states more than a simple Gaussian/Rayleigh process would allow.

This table helps summarize the parameters used in the visualization:

ParameterSymbolBesselK (Pukite)Weibull (Standard)
Shape\nu / k0.6 (High Volatility)2.0 (Rayleigh-like)
Scaleb / \lambda4.010.0
Mean Speed\bar{v}~8.9 m/s~8.9 m/s

Summary: “While the Weibull model is ‘safe’ for calculating average monthly revenue, the BesselK model is ‘necessary’ for calculating whether your turbine will survive a 50-year storm event. The log-plot reveals that extreme wind isn’t just an ‘outlier’—it is a mathematically predictable part of the atmospheric structure.”



The full discussion is here: https://gemini.google.com/app/4dfb7e86354dcb74

I’m surprised on how good a job it does in digesting the reasoning and derivation in the book, specifically Chapter 11
https://agupubs.onlinelibrary.wiley.com/doi/10.1002/9781119434351.ch11

I busted my butt in writing that chapter (and the rest), so am happy to see that it can actually be “understood” enough by an LLM to provide value for further research.

Information Theory in Earth Science: Been there, done that

Following up from this post, there is a recent sequence of articles in an AGU journal on Water Resources Research under the heading: “Debates: Does Information Theory Provide a New Paradigm for Earth Science?”

By anticipating all these ideas, you can find plenty of examples and derivations (with many centered on the ideas of Maximum Entropy) in our book Mathematical Geoenergy.

Here is an excerpt from the “Emerging concepts” entry, which indirectly addresses negative entropy:

“While dynamical system theories have a long history in mathematics and physics and diverse applications to the hydrological sciences (e.g., Sangoyomi et al., 1996; Sivakumar, 2000; Rodriguez-Iturbe et al., 1989, 1991), their treatment of information has remained probabilistic akin to what is done in classical thermodynamics and statistics. In fact, the dynamical system theories treated entropy production as exponential uncertainty growth associated with stochastic perturbation of a deterministic system along unstable directions (where neighboring states grow exponentially apart), a notion linked to deterministic chaos. Therefore, while the kinematic geometry of a system was deemed deterministic, entropy (and information) remained inherently probabilistic. This led to the misconception that entropy could only exist in stochastically perturbed systems but not in deterministic systems without such perturbations, thereby violating the physical thermodynamic fact that entropy is being produced in nature irrespective of how we model it.

In that sense, classical dynamical system theories and their treatments of entropy and information were essentially the same as those in classical statistical mechanics. Therefore, the vast literature on dynamical systems, including applications to the Earth sciences, was never able to address information in ways going beyond the classical probabilistic paradigm.”

That is, there are likely many earth system behaviors that are highly ordered, but the complexity and non-linearity of their mechanisms makes them appear stochastic or chaotic (high positive entropy) yet the reality is that they are just a complicated deterministic model (negative entropy). We just aren’t looking hard enough to discover the underlying patterns on most of this stuff.

An excerpt from the Occam’s Razor entry, lifts from my cite of Gell-Mann

“Science and data compression have the same objective: discovery of patterns in (observed) data, in order to describe them in a compact form. In the case of science, we call this process of compression “explaining observed data.” The proposed or resulting compact form is often referred to as “hypothesis,” “theory,” or “law,” which can then be used to predict new observations. There is a strong parallel between the scientific method and the theory behind data compression. The field of algorithmic information theory (AIT) defines the complexity of data as its information content. This is formalized as the size (file length in bits) of its minimal description in the form of the shortest computer program that can produce the data. Although complexity can have many different meanings in different contexts (Gell-Mann, 1995), the AIT definition is particularly useful for quantifying parsimony of models and its role in science. “

Parsimony of models is a measure of negative entropy

The Search for Order

Chap 10 Mathematical Geoenergy

For the LTE formulation along the equator, the analytical solution reduces to g(f(t)), where g(x) is a periodic function. Without knowing what g(x) is, we can use the frequency-domain entropy or spectral entropy of the Fourier series mapping an estimated x=f(t) forcing amplitude to a measured climate index time series such as ENSO. The frequency-domain entropy is the sum or integral of this mapping of x to g(x) in reciprocal space applying the Shannon entropy –I(f).ln(I(f)) normalized over the I(f) frequency range, which is the power spectral (frequency) density of the mapping from the modeled forcing to the time-series waveform sample.

This measures the entropy or degree of disorder of the mapping. So to maximize the degree of order, we minimize this entropy value.

This calculated entropy is a single scalar metric that eliminates the need for evaluating various cyclic g(x) patterns to achieve the best fit. Instead, what it does is point to a highly-ordered spectrum (top panel in the above figure), of which the delta spikes can then be reverse engineered to deduce the primary frequency components arising from the the LTE modulation factor g(x).

The approach works particularly well once the spectral spikes begin to emerge from the background. In terms of a physical picture, what is actually emerging are the principle standing wave solutions for particular wavenumbers. One can see this in the LTE modulation spectrum below where there is a spike at a wavenumber at 1.5 and one at around 10 in panel A (isolating the sin spectrum and cosine spectrum separately instead of the quadrature of the two giving the spectral intensity). This is then reverse engineered as a fit to the actual LTE modulation g(x) in panel B. Panel D is the tidal forcing x=f(t) that minimized the Shannon entropy, thus creating the final fit g(f(t)) in panel C when the LTE modulation is applied to the forcing.

The approach does work, which is quite a boon to the efficiency of iterative fitting towards a solution, reducing the number of DOF involved in the calculation. Prior to this, a guess for the LTE modulation was required and the iterative fit would need to evolve towards the optimal modulation periods. In other words, either approach works, but the entropy approach may provide a quicker and more efficient path to discovering the underlying standing-wave order.

I will eventually add this to the LTE fitting software distro available on GitHub. This may also be applicable to other measures of entropy such as Tallis, Renyi, multi-scale, and perhaps Bispectral entropy, and will add those to the conventional Shannon entropy measure as needed.