Each fitted model result shows the cross-validation results based on training of held-out data — i.e. training on only the intervals outside of 0.6-0.8 (i.e. training on t<0.6 and t>0.8 of the data, which extends from t=0.0 to t=1.0 normalized). The best results are for time-series that have 100 years or more worth of monthly data, so the held-out data is typically 20 years. There is no selection bias trickery here, as this is a collection of independent sites and nothing in the MLR fitting process is specific to an individual time-series. In the following, the collection of results starts with the Stockholm site in Sweden, keeping in mind that the dashed line in the charts indicates the test or validation interval.











































The northern coast of France along the English Channel from Brest through Normandy has some of the highest tidal ranges of any coastal region. Yet, that is not exactly what is being indicated in these models – – all the diurnal (and the annual) cycles have been removed, leaving only the residual extremes, which are only a fraction of the daily extremes. Consider these two photos of a Normandy beach near Deauville that I took, on different days, only guessing which was high tide (on the left) and low tide (on the right). It’s a bit difficult to capture the extent due to foreshortening in the photo, but this is a 5-meter change in height according to the recorded predictions (from
)




The raw monthly mean residual for Brest is much less than 5 meters (see the inset chart below from the PSMSL site), and after removing a linear trend and filtering over an annual cycle, it’s on the order of 100 mm or 0.1 meter as a typical cyclical excursion.

So that’s what the multiple linear regression is fitting to — nonlinear tidal effects that are about 1/50 the strength of the primary diurnal or semi-diurnal excursion.
Some climate indices as well
These include a fitted trend if needed.




Not all the models work this well. Some of the shorter time-series and those with many missing entries . For example, the following two from Japan are on the shorter side.


In the case of Vancouver, increasing the set of parameters leads to over-fitting

Yet, nearby Victoria shows better cross-validation with the quadrennial factors, ostensibly as it has fewer missing data points.

In general, if the available set of training data drops below 75 years, overfitting may occur.

And yet some sites, such as Baltimore, Philadelphia, New York, Boston, Portland (Maine) show poor correlations over the test interval, even though they each have 100 years of monthly data.






These are all somewhat co-located in northeastern USA, so it’s not clear if there is some common-mode factor not being picked up. The two northernmost sites, Portland and Boston have stronger tidal swings (nearer Bay of Fundy). There has been indication that the 4.4y tidal cycle has a greater influence along the eastern seaboard [1]. The following includes that factor as a duplicate parameter, allowing the MLR to overfit slightly.




Philadelphia and Baltimore have much weaker tidal swings, and reportedly more sensitive to river flows.
The last subpar fit examples are Halifax in Nova Scotia and Newlyn (cited in [1]), located at the tip of SW England next to Penzance, across the English Channel from Brest. These both benefited from the 4.4y duplicate factor.


No data from South America. These are limited in span
https://imagizer.imageshack.com/img924/2084/fTrLje.png
Talcahuano, Chile
https://imagizer.imageshack.com/img923/1352/VHWTQx.png
Valparaiso, Chile
Observing ENSO-modulated tides from space
https://geoenergymath.com/wp-content/uploads/2025/09/202412PO.pdf
Pingback: Simpler models … alternate interval