The objective is to apply negative entropy to find an optimal solution to a deterministically ordered pattern. To start, let us contrast the behavior of autonomous vs non-autonomous differential equations. One way to think about the distinction is that the transfer function for non-autonomous only depends on the presenting input. Thus, it acts like an op-amp with infinite bandwidth. Or below saturation it gives perfectly linear amplification, so that as shown on the graph to the right, the x-axis input produces an amplified y-axis output as long as the input is within reasonable limits.
In contrast, for an autonomous formulation, the amplification depends on prior values so it requires a time-domain convolution or a frequency-domain transfer function. The spectral response chart to the right is the classic representation of the frequency response of a linear 2nd-order differential equation. This is also known as an autonomous system class of differential equation, where the evolution of the response is time-invariant to the starting point. So for a non-autonomous behavior, the time-varying aspects essentially control the output and act to in a sense to reset the system continuously.
There are many non-autonomous formulations that aren’t linear, for example a companding transfer that takes the square root of the input (used for compressing the dynamic range of a signal). This essentially gradually saturates with increasing values of the absolute value of the input as shown below
What does this have to do with entropy? Consider that a non-autonomous transfer function can become even more elaborate in terms of a mapping pattern and thus possess a definable amount of underlying order. Yet that order or pattern may be difficult to discern without adequate information, which is where the concepts of entropy metrics such as Shannon entropy come in.
As an example, what if the non-autonomous transfer function itself is peculiar, such as a potentially complex sinusoidal modulation of unknown modulation frequency and phase. This occurs in Mach-Zehnder modulation or in our Laplace’s Tidal Equation formulation. The effect is to distort the input enough to essentially fold the amplitude at certain points, as shown in the chart to the right. Note that the input is not the time-value, but some other level or amplitude associated with the system. So that the output may be a positive amplification for a certain level but then will reverse (i.e. fold or break) and become negative as the level is increased. And this can then cycle for increasing input amplitude levels. If the modulation is strong enough, the output will be unrecognizable from the input.
The difficulty is if we have little knowledge of the input forcing or the modulation, we will not be able to decode anything. But with a measure such as Negative Shannon Entropy, we can see how far we can go with limited info.
So consider this output waveform that we are told is due to Mach-Zehnder modulation of an unknown input :
All we know is that there may be a basis forcing that consists of a couple of sinusoids, and that there is an obvious (but unknown) non-autonomous complex modulation that is generating the above waveform
The idea is that we test out various combinations of sinusoidal parameters and then maximize the Shannon entropy of the power spectrum of the transfer from input to output (see the link I first mentioned at the top of this post). We can do this by calculating a discrete Fourier transform or an FFT of the input-to-output mapping (remember that the x-axis is not time but an input level) and multiplying by the complex conjugate to get the power spectrum. For a perfectly linear amplification as in the first example, it is essentially a delta function at a frequency of zero, indicating maximum order with a maximum negative Shannon entropy. And for a single sinusoidal frequency modulation, the power spectrum would be a delta shifted to the frequency of the modulation. Again this will be a maximally-ordered amplification, and again with a maximum in negative Shannon entropy. Yet, in practical terms, perhaps something such as a Renyi or Tsallis entropy measure would work even better than Shannon entropy. Actually, the Tsallis entropy is close to describing a mean-square variance error in a signal, whereby it exaggerates clusters or strong excursions when compared against a constant background.
So this is what I have used that works quite well. I essentially maximize the normalized mean-squared variance of the power spectrum
The result of a search algorithm of input sinusoidal factors to maximize the power spectrum variance value of the unknown time-series shown in FIGURE 1 is this power spectrum
which arises from this optimal input forcing
Note that this is not the transfer modulation, which we still need to extract from the power spectrum.
As a result, this negative entropy algorithm is able to deconstruct or decode a Mach-Zehnder modulation of two sinusoidal factors that’s encoding an input forcing of another pair of sinusoidal factors. So essentially we are able to find 4 unknown factors (or 8 if both amplitude and phase are included) by only searching on 2 factors (or 4 if amplitude and phase are included). But how is that possible? It’s actually not a free lunch because the power spectrum calculation is essentially testing all possible modulations in parallel and the negative entropy calculation is keeping track of the frequency components that maximize the delta functions in the spectrum. I.E. the mean-square variance is weighting greater excursions than a flat highly-random background would.
From Fig.2 in our paper, the schematic to the right gives the general idea. For negative entropy we are looking for the upper spectrum, not the lower, which is a maximum entropy picture of a disordered system.
This works well for certain applications. It may even work better in a search algorithm than if you did a pure RMS minimization of fitting the 4 sinusoidal factors directly against the output (which is the naive brute force approach), as it may not fall into local minima as easily. Doing the power spectrum helps to immediately broaden the input search parameter space I believe.
Yet, there is more. One can also keep track of the output spectrum for possible harmonics of a spectral peak. As harmonics are indicators of even further order, if the value of a harmonically-related spectral peak (an integer multiple of the fundamental) is added to the primary (fundamental) peak, it will get a negative entropy boost against the background when squared.
When using this variation, consider dividing the summed terms F(ω) by sqrt(ω). This ends up often working better since it doesn’t weight the density of higher frequencies (i.e. dω needs to be made progressively smaller for higher ω). Can multiply the summed terms F(ω) by sqrt(ω) if the weighting is to higher frequencies. Also can use log(ω) instead to set a more gradual weighting.
Another approach is to group the dataset into two equal length intervals, and do a correlation coefficient between the two F(ω) spectra. This will maintain a degree of stationarity as higher CC of the intervals indicates that the temporal pattern doesn’t change (it shouldn’t with tides). Then if the summed terms of the total range are multiplied by this factor, it should prevent divergence. The key is to experiment!