[time-nuts] Re: pulling some crystals

Mon Dec 18 15:14:26 UTC 2023

Stewart Cobb via time-nuts <time-nuts at lists.febo.com> writes:

> Anyone considering DDS implementations in an FPGA should look at using the
> CORDIC algorithm instead of sin/cos lookup tables. For short DAC output
> words, a table is usually better and faster, but for long output words, the
> table approach becomes unwieldy and the CORDIC starts to win.
>
> If raw speed is the goal, is it's possible to build DDS counters and CORDIC
> stages using serial arithmetic which will run at nearly the toggle speed of
> the FPGA. Unfortunately, the number of CORDIC stages required by this trick
> expands as roughly the square of the number of phase bits used from the
> accumulator. Even though one CORDIC stage generally fits into one CLB, this
> still becomes a lot of logic. And the control logic for all those serial
> accumulators is tricky.

Don't forget that you can also use a lookup table implementation with
interpolation between table entries. Often, a small table and simple
interpolation methods will get you very good accuracy. In my own
implementations, I've found a simple linear interpolation between
elements to work quite well. This consumes minimal resources, including
memory. Often, the lookup table won't even occupy block RAM because it's
small enough where it would be a waste of BRAM to put it there, and it
will wind up in distributed memory. You can add higher order terms if
you need them, but usually they're not needed. If you're simultaneously
generating a sine and cosine value (very common), another interpolation
method would be a first-order Taylor series. This is basically free to
implement because the derivative of a sine is a cosine and the
derivative of a cosine is minus sine. However, I generally stick with
linear interpolation between elements, which I find has some nicer
properties.

This probably goes without saying, but also don't forget that you don't
need to store one full sine/cosine period! The only non-redundant part
if you're computing just sine or just cosine is one quarter of the
period. Or, if you're computing both simultaneously, just take 1/8 of a
period. So, for the fixed-point implementation when computing both, drop
the 3 most significant bits for lookup in the table, then use those 3
MSBs to adjust the lookup value for the full period. Fixed point, in
addition to being cheaper to implement on an FPGA, has some nice
advantages over floating point for representing phase (and frequency)
values, including that it wraps naturally and has a constant resolution
over 0 to 2pi.

Finally, the lookup table plus interpolation approach can easily run at
full FPGA clock rates. I implemented a DDS recently that runs at the
system clock rate of 312.5 MHz on a Xilinx MPSoC. I think it could run
quite a bit faster too, but I haven't tried. Also, when it matters, this
approach has a much lower latency than the CORDIC. I'm not convinced
CORDICs have much use in DDS implementations when you have embedded
multipliers available, given how cheap (in terms of resources) and
performant LUT + interpolation techniques are.

Matt