Using MaxEnt for wind speed probability estimation

Use the Bonneville Power Administration site for data. This page provides wind speed data info at various sites in Oregon.

Pick the first site Augspurger (near Hood River along the Columbia River Gorge) and the latest March data, which is a set of data points collected every 5 minutes, generating about N=8000 rows. The data is supplied in CSV format, which will need to be parsed for the wind speed data.

There are 4 stages to setting up a Maximum Entropy model from the data:

  1. Setup and initialize data structures, such as the histogram binning intervals
  2. Create histogram from the data
  3. Normalize probability distribution function
  4. Do ordinary least-squares (OLR) regression to fit histogram to MaxEnt function

First assume that the wind speed data is read into e.g.a vector data structure, call it Data, that consists of the N rows of data (the time-stamp is not needed)

Step 1

Pick a data structure that defines the width of a bin, holds the raw count, and holds the processed cumulative probability

type Bin is record
   Interval : Float;
   Count : Integer;
   Cumulative : Float;
end record;

Declare a histogram array that will hold enough binning intervals that will span the data. (Can do this as a separate pre-processing step, otherwise can do a rough estimate by visually scanning the data looking for min and max values)

Histogram : array (0..200) of Bin;
Bin_Width : Float := 100.0;

The histogram is then initialized

 -- Initialize the bins with the interval spacing
   for I in Histogram'Range loop
      Histogram(I).Interval := I * Bin_Width;
      Histogram(I).Count := 0;
   end loop;

Step 2

Make histogram from the data, iterating through the elements in the input data vector. First, square the speed data to make it amenable to a MaxEntropy analysis, and then determine which bin it falls in to. Increment the count if it’s within the bounding interval.

   for I in Data'Range loop
      for J in Histogram'First .. Histogram'Last-1  loop
         if Data(I)^2 >= Histogram(J).Interval and 
            Data(I)^2 < Histogram(J+1).Interval then
            Histogram(J).Count := Histogram(J).Count + 1;
         end if;
      end loop;
   end loop;

Step 3

Normalize the counts within the bins to probabilities, using the total count as a conversion factor. Probabilities can never be negative so stop the conversion if the cumulative dips below zero.

Cumulative_P := 1.0;
for J in Histogram'Range loop
   Cumulative_P := Cumulative_P - Histogram(J).Count/Total;
   exit when Cumulative_Prob  < 0.0;
   Histogram(J).Cumulative := Cumulative_Prob;
end loop;

Step 4

The OLS is simply calculated from the sum of the logarithms of the cumulative probabilities (calculated in Step 3) divided by the sum of the cumulative interval lengths. Truncate the range to exclude the most sparse data, which is most sensitive to deviation from the MaxEntropy distribution. Here, excluded at the 0.01 or 1% level (could also be set to zero).

Num := 0.0; Den := 0.0;
for J in Histogram'Range loop
   exit when Histogram(J).Cumulative <= 0.01;
   Num := Num - Log(Histogram(J).Cumulative);
   Den := Den + Histogram(J+1).Interval;
end loop;
K := Num/Den;

Analysis Problem

Since the Augsperger site is near a popular wind-surfing area, one can ask a question of the model: What is the probability that one would expect a wind of less than 5 MPH, which is a minimum for wind-surfing?

      V : Float := 5.0;
      P : Float;
      P := 1.0 - exp(-K * V^2);
      Text_IO.Put_line("P < 5 = " & P'Img);

This uses the K value found from Step 4 and the cumulative probability for all squared speeds V^2 from 0 to 5^2.


This is a chart of the histogram results and model fit (in RED) by OLS using the 100 MPH^2 bin interval.