Use the Bonneville Power Administration site for data. This page provides wind speed data info at various sites in Oregon.
https://transmission.bpa.gov/business/operations/wind/MetData/default.aspx
Pick the first site Augspurger (near Hood River along the Columbia River Gorge) and the latest March data, which is a set of data points collected every 5 minutes, generating about N=8000 rows. The data is supplied in CSV format, which will need to be parsed for the wind speed data.
There are 4 stages to setting up a Maximum Entropy model from the data:
- Setup and initialize data structures, such as the histogram binning intervals
- Create histogram from the data
- Normalize probability distribution function
- Do ordinary least-squares (OLR) regression to fit histogram to MaxEnt function
First assume that the wind speed data is read into e.g.a vector data structure, call it Data, that consists of the N rows of data (the time-stamp is not needed)
Step 1
Pick a data structure that defines the width of a bin, holds the raw count, and holds the processed cumulative probability
type Bin is record
Interval : Float;
Count : Integer;
Cumulative : Float;
end record;
Declare a histogram array that will hold enough binning intervals that will span the data. (Can do this as a separate pre-processing step, otherwise can do a rough estimate by visually scanning the data looking for min and max values)
Histogram : array (0..200) of Bin;
Bin_Width : Float := 100.0;
The histogram is then initialized
-- Initialize the bins with the interval spacing
for I in Histogram'Range loop
Histogram(I).Interval := I * Bin_Width;
Histogram(I).Count := 0;
end loop;
Step 2
Make histogram from the data, iterating through the elements in the input data vector. First, square the speed data to make it amenable to a MaxEntropy analysis, and then determine which bin it falls in to. Increment the count if it’s within the bounding interval.
for I in Data'Range loop
for J in Histogram'First .. Histogram'Last-1 loop
if Data(I)^2 >= Histogram(J).Interval and
Data(I)^2 < Histogram(J+1).Interval then
Histogram(J).Count := Histogram(J).Count + 1;
end if;
end loop;
end loop;
Step 3
Normalize the counts within the bins to probabilities, using the total count as a conversion factor. Probabilities can never be negative so stop the conversion if the cumulative dips below zero.
Cumulative_P := 1.0;
for J in Histogram'Range loop
Cumulative_P := Cumulative_P - Histogram(J).Count/Total;
exit when Cumulative_Prob < 0.0;
Histogram(J).Cumulative := Cumulative_Prob;
end loop;
Step 4
The OLS is simply calculated from the sum of the logarithms of the cumulative probabilities (calculated in Step 3) divided by the sum of the cumulative interval lengths. Truncate the range to exclude the most sparse data, which is most sensitive to deviation from the MaxEntropy distribution. Here, excluded at the 0.01 or 1% level (could also be set to zero).
Num := 0.0; Den := 0.0;
for J in Histogram'Range loop
exit when Histogram(J).Cumulative <= 0.01;
Num := Num - Log(Histogram(J).Cumulative);
Den := Den + Histogram(J+1).Interval;
end loop;
K := Num/Den;
Analysis Problem
Since the Augsperger site is near a popular wind-surfing area, one can ask a question of the model: What is the probability that one would expect a wind of less than 5 MPH, which is a minimum for wind-surfing?
declare
V : Float := 5.0;
P : Float;
begin
P := 1.0 - exp(-K * V^2);
Text_IO.Put_line("P < 5 = " & P'Img);
end;
This uses the K value found from Step 4 and the cumulative probability for all squared speeds V^2 from 0 to 5^2.
Chart
This is a chart of the histogram results and model fit (in RED) by OLS using the 100 MPH^2 bin interval.
