Simpler models … examples

continued from last post.

Each fitted model result shows the cross-validation results based on training of held-out data — i.e. training on only the intervals outside of 0.6-0.8 (i.e. training on t<0.6 and t>0.8 of the data, which extends from t=0.0 to t=1.0 normalized). The best results are for time-series that have 100 years or more worth of monthly data, so the held-out data is typically 20 years. There is no selection bias trickery here, as this is a collection of independent sites and nothing in the MLR fitting process is specific to an individual time-series. In the following, the collection of results starts with the Stockholm site in Sweden, keeping in mind that the dashed line in the charts indicates the test or validation interval.

I was recently in Stockholm, and this is a photo pointed toward the location of the measurement station, about 4000 feet away labeled by the marker on the right below:
Stockholm, Sweden
Korsor, Denmark
Klaipeda, Lithuania
Hornbaek, Denmark
Warnemunde, Germany
Gedser, Denmark
Trois-Rivieres, Quebec
Travemunde, Germany
Manila, Philippines
Helsinki, Finland
San Diego, California
Galveston, Texas
Mantyluoto, Finland
Vancouver, BC — (lower DOF, biennial)
Smogen, Sweden
Furuogrund, Sweden
Vlissingen, Netherlands
Visby, Sweden
Aberdeen, Scotland
Ketchikan, Alaska
Hoek Van Holland, Netherlands
Charleston, South Carolina
West-Terschelling, Netherlands
Den Helder, Netherlands
Los Angeles, California
Pensacola, Florida
Delfzijl, Netherlands
La Jolla, California
Harlingen, Netherlands
Astoria, Washington — (lower DOF, biennial)
Kaskinen, Finland
Ijmuiden, Netherlands
Mumbai, India
Genova, Italy — many missing values
Oslo, Norway
Sydney, Australia — Earlier interval of Ft Denison time series
Landsort, Sweden
Olands, Sweden
Kungsholmsfort, Sweden
Aarhus, Denmark
Brest, France – from 1880 only

The northern coast of France along the English Channel from Brest through Normandy has some of the highest tidal ranges of any coastal region. Yet, that is not exactly what is being indicated in these models – – all the diurnal (and the annual) cycles have been removed, leaving only the residual extremes, which are only a fraction of the daily extremes. Consider these two photos of a Normandy beach near Deauville that I took, on different days, only guessing which was high tide (on the left) and low tide (on the right). It’s a bit difficult to capture the extent due to foreshortening in the photo, but this is a 5-meter change in height according to the recorded predictions (from https://maree.shom.fr)

The raw monthly mean residual for Brest is much less than 5 meters (see the inset chart below from the PSMSL site), and after removing a linear trend and filtering over an annual cycle, it’s on the order of 100 mm or 0.1 meter as a typical cyclical excursion.

So that’s what the multiple linear regression is fitting to — nonlinear tidal effects that are about 1/50 the strength of the primary diurnal or semi-diurnal excursion.


Some climate indices as well

These include a fitted trend if needed.

NAO index — filtered with a 12-month boxcar
TNA index
PDO index
El Nino Modoki index

Not all the models work this well. Some of the shorter time-series and those with many missing entries . For example, the following two from Japan are on the shorter side.

Aburatsubo, Japan
Wajima, Japan

In the case of Vancouver, increasing the set of parameters leads to over-fitting

Vancouver, BC — including quadrennial factors

Yet, nearby Victoria shows better cross-validation with the quadrennial factors, ostensibly as it has fewer missing data points.

Victoria, BC

In general, if the available set of training data drops below 75 years, overfitting may occur.

Port Pirie, Australia

And yet some sites, such as Baltimore, Philadelphia, New York, Boston, Portland (Maine) show poor correlations over the test interval, even though they each have 100 years of monthly data.

Baltimore, Maryland
Philadelphia, Pennsylvania
NYC Battery, New York
Atlantic City, New Jersey
Portland, Maine
Boston, Massachusetts

These are all somewhat co-located in northeastern USA, so it’s not clear if there is some common-mode factor not being picked up. The two northernmost sites, Portland and Boston have stronger tidal swings (nearer Bay of Fundy). There has been indication that the 4.4y tidal cycle has a greater influence along the eastern seaboard [1]. The following includes that factor as a duplicate parameter, allowing the MLR to overfit slightly.

Portland, Maine — duplicate 0.2259/yr tidal factor
Boston, Mass — duplicate 0.2259/yr tidal factor
NYC Battery, New York — duplicate 0.2259/yr tidal factor
Atlantic City, New Jersey — duplicate 0.2259/yr tidal factor

Philadelphia and Baltimore have much weaker tidal swings, and reportedly more sensitive to river flows.

The last subpar fit examples are Halifax in Nova Scotia and Newlyn (cited in [1]), located at the tip of SW England next to Penzance, across the English Channel from Brest. These both benefited from the 4.4y duplicate factor.

Newlyn, England — duplicate 0.2259/yr tidal factor
Halifax, Nova Scotia — duplicate 0.2259/yr tidal factor

References

  1. The Semiannual and 4.4‐Year Modulations of Extreme High Tides – Ray – 2019 – Journal of Geophysical Research: Oceans – Wiley Online Library

3 thoughts on “Simpler models … examples

  1. Pingback: Simpler models … alternate interval

Leave a Reply