[time-nuts] Allan variance by sine-wave fitting

Mon Nov 27 18:37:11 UTC 2017

Moin Ralph,

On Sun, 26 Nov 2017 21:33:03 -0800
Ralph Devoe <rgdevoe at gmail.com> wrote:

> The issue I intended to raise, but which I'm not sure I stated clearly
> enough, is a conjecture: Is least-square fitting as efficient as any of the
> other direct-digital or SDR techniques? 

You stated that, yes, but it's well hidden in the paper.

> Is the resolution of any
> direct-digital system limited by (a) the effective number of bits of the
> ADC and (b) the number of samples averaged? Thanks to Attila for reminding
> me of the Sherman and Joerdens paper, which I have not read carefully
> before. In their appendix Eq. A6 they derive a result which may or may not
> be related to Eq. 6 in my paper. 

They are related, but only accidentally. S&J derive a lower bound for the
Allan variance from the SNR. You try to derive the lower bound
for the Allan variance from the quantization noise. That you end up
with similar looking formulas comes from the fact that both methods
have a scaling in 1/sqrt(X) where X is the number of samples taken.
though S&J use the number of phase estimates, while you use the 
number of ADC samples. While related, they are not the same.
And you both have a scaling of 1/(2*pi*f) to get from phase to time.
You will notice that your formla contains a 2^N term, with N being
the number of bits, but which you derive from the SNR (ENOB).
It's easy to show that the SNR due to quantization noise 
is proportional to size of an LSB, ie. SNR ~ 2^N. If we now put in
all variables and substitute 2^N by SNR will see:

S&J: sigma >= 1/(2*pi*f) * sqrt(2/(SNR*N_sample))   (note the inequality!)
Yours: sigma ~= 1/(2*pi*f) * 1/SNR * sqrt(1/M)      (up to a constant)

Note three differences:
1) S&J scales with 1/sqrt(SNR) while yours scales with 1/SNR
2) S&J have a tau depndence implicit in the formula due to N_sample, you do not.
3) S&J is a lower bound, yours an approximation (or claims to be).

> If the conjecture is true then the SDR
> technique must be viewed as one several equivalent algorithms for
> estimating phase. Note that the time deviation for a single ADC channel in
> the Sherman and Joerdens paper in Fig. 3c is about the same as my value.
> This suggests that the conjecture is true.

Yes, you get to similar values, if you extrapolate from the TDEV
data in S&J Fig3c down to 40µs that you used. BUT: while S&J see
a decrease of the TDEV consistend with white phase noise until they
hit the flicker phase noise floor at about a tau of 1ms, your data
does not show such a decrease (or at least I didn't see it).

> Other criticisms seem off the mark:
> 
> Several people raised the question of the filter factor of the least-square
> fit.  First, if there is a filtering bias due to the fit, it would be the
> same for signal and reference channels and should cancel. Second, even if
> there is a bias, it would have to fluctuate from second to second to cause
> a frequency error. 

Bob answered that already, and I am pretty sure that Magnus will comment
on it as well. Both are better suited than me to go into the details of this.

> Third, the Monte Carlo results show no bias. The output
> of the Monte Carlo system is the difference between the fit result and the
> known MC input. Any fitting bias would show up in the difference, but there
> is none.

Sorry, but this is simply not the case. If I undestood your simulations
correctly (you give very little information about them), you used additive
Gaussian i.i.d noise on top of the signal. Of course, if you add Gaussian
i.i.d noise with zero mean, you will get zero bias in a linear least squares
fit. But, as Magnus and I have tried to tell you, noises we see in this area
are not necessarily Gauss i.i.d. Only white phase noise is Gauss i.i.d.
Most of the techniques we use in statistics implicitly assume Gauss i.i.d.
To show you that things fail in quite interesting way assume this:

X(t): Random variable, Gauss distributed, zero mean, i.i.d (ie PSD = const)
Y(t): Random variable, Gauss distributed, zero mean, PSD ~ 1/f
Two time points: t_0 and t, where t > t_0

Then:

E[X(t) | X(t_0)] = 0
E[Y(t) | Y(t_0)] = Y(t_0)

Ie. the expectation of X will be zero, no matter whether you know any sample
of the random variable. But for Y, the expectation is biased to the last
sample you have seen, ie it is NOT zero for anything where t>0.
A consequence of this is, that if you take a number of samples, the average
will not approach zero for the limit of the number of samples going to infinity.
(For details see the theory of fractional Brownian motion, especially
the papers by Mandelbrot and his colleagues)

A PSD ~ 1/f is flicker phase noise, which usually starts to be relevant
in our systems for sampling times between 1µs (for high frequency stuff)
and 1-100s (high stability oscillators and atomic clocks). Unfortunately,
the Allan deviation does not discern between white phase noise and
flicker phase noise, so it's not possible to see in your plots where
flicker noise becomes relevant (that's why we have MDEV).

Or TL;DR: for measurements in our field, you cannot assume that the
noise you have is uncorrlated or has zero mean. If you do simulations,
you have to account for that properly (which is not an easy task).
Thus you can also not assume that your estimator has no bias, because
the underlying assumptions in this deduction is that the noise has
zero mean and averages out if you take a large number of samples.

And we have not yet touched the topic of higher order noises, with a PSD
that's proportional to 1/f^a with a>1.

> Attila says that I exaggerate the difficulty of programming an FPGA. Not
> so. At work we give experts 1-6 months for a new FPGA design. We recently
> ported some code from a Spartan 3 to a Spartan 6. Months of debugging
> followed.

This argument means that either your design was very complex, or it
used features of the Spartan3 that are not present in Spartan6 anymore.
The argument does not say anything about the difficulty of writing
a down-mixer and sub-sampling code (which takes less than a month,
including all validation, if you have no prior experience in signal
processing). Yes, it's still more complicated than calling a python
function. But if you'd have to write that python function yourself
(to make the comparison fair), then it would take you considerably
longer to make sure the fitting function worked correctly.
Using python for the curve fitting is like you would get the VHDL code
for the whole signal processing part from someone. That's easy to handle.
Done in an afternoon. At most.

And just to further underline my point here: I have both written
VHDL code for FPGAs to down mix and subsample and done sine fitting
using the very same python function you have used. I know the complexity
of both. As I know their pitfalls.

> FPGA's will always be faster and more computationally efficient
> than Python, but Python is fast enough. The motivation for this experiment
> was to use a high-level language (Python) and preexisting firmware and
> software (Digilent) so that the device could be set up and reconfigured
> easily, leaving more time to think about the important issues.

Sure. This is fine. But please do not bash other techniques, just because
you are not good at handling them. Especially if you hide the complexity
of your approach completely in a sidermark. (Yes, that ticked me off)

> Attila has about a dozen criticisms of the theory section, mostly that it
> is not rigorous enough and there are many assumptions. But it is not
> intended to be rigorous. 

If it is not intended as such, then you should make it clear in
the paper. Or put it in an appendix. Currently the theory is almost 4 of
the 8 pages of your paper. So it looks like an important part.
And you still should make sure it is correct. Which currently it isn't.

> This is primarily an experimental paper and the
> purpose of the theory is to give a simple physical picture of the
> surprizingly good results. It does that, and the experimental results
> support the conjecture above.

Then you should have gone with a simple SNR based formula like S&J
or referenced one of the many papers out there that do this kind of
calculations and just repeated the formula with a comment that the
derivation is in paper X.

> The limitations of the theory are discussed in detail on p. 6 where it is
> called "... a convenient approximation.." Despite this the theory agrees
> with the Monte Carlo over most of parameter space, and where it does not is
> discussed in the text.

Please! This is bad science! You build a theory on flawed foundations,
use this theory as a foundation in your simulations. And when the
simulations agree with your theory you claim the theory is correct?
Please, do not do this!

Yes, it is ok to approximate. Yes it is ok to make assumptions.
But please be aware what the limits of those approximations and
assumptions are. I have tried to point out the flaws in your
argumentation and how they affect the validity of your paper.
If you just want to do an experimental paper, then the right
thing to do would be to cut out all the theory and concentrate
on the experiments.

I hope it comes accross that I do not critisize your experiments
or the results you got out of them. I critisize the analysis you
have done and that it contains assumptions, which you are not aware
of, that invalidate some of your results. The experiments are fine.
The precision you get is fine. But your analysis is flawed.

			Attila Kinali

-- 
It is upon moral qualities that a society is ultimately founded. All 
the prosperity and technological sophistication in the world is of no 
use without that foundation.
                 -- Miss Matheson, The Diamond Age, Neil Stephenson