[time-nuts] Experience writing a millisecond-accurate NTP client on the ESP32

Fri Feb 11 06:34:46 UTC 2022

Fellow nuts,

I'd like to report my experience building and characterizing an NTP client
on the ESP32. My client was able to synchronize to UTC using an Internet
time server about 4ms away with the following accuracy in terms of absolute
error:

Median: 216us
95th percentile: 801us
99th percentile: 1,607us
99.9th percentile: 2,461us

--Background--

The ESP32 is a family of very low-cost but highly capable microcontroller
chips that have become popular with hobbyists. You can buy a complete
module on Amazon -- dual-core, 160mhz, with built-in WiFi and bluetooth,
plus a USB interface for programming -- for about $7. The
previous-generation ESP8266 is sold for the even more absurd price of about
$4!

Lately I've been interested in trying to collect sensor data with an ESP32
and report it to a server using the chip's built-in wifi. Of course, I'd
like the sensor readings to be properly timestamped. Although there are a
few NTP libraries available for the ESP32, none that I found had reports on
their performance. In addition, they all did "one-shot" synchronization,
i.e., at one point in time, determining the current offset of the local
clock to the NTP timescale. They did not attempt to keep a history of
observations and do any outlier rejection or correct for local clock rate
error. I was convinced I could do better.

Searching the archives of this list, I have not seen much discussion of the
ESP32 other than speculation a few years ago that one might be used as an
NTP server by attaching a GPS to it.

--Implementation--

I wrote an NTP client for the ESP32 (
https://github.com/jelson/rulos/tree/main/src/lib/chip/esp32/periph/ntp)
that periodically sends a request to an NTP server and keeps the history of
recent observations of the offset between the local clock and the NTP
server's clock. The one-way latency is assumed to be half the round trip
time, minus the server's reported processing delay and a calibration
constant described below. When first starting up, it sends a request every
4 seconds until it has 10 responses. Then, in the steady-state, it sends a
request every 60 seconds, keeping a history of the past 30 minutes of
observations.

Each time a new observation is received, it performs simple least-squares
linear regression on all the observations to construct a linear model
relating the local clock to the NTP timescale. The user interface to my
library is simple -- "get current epoch timestamp" -- which works by
reading the local clock and applying the linear model to predict the
corresponding NTP timestamp. Linear regression has two good effects: first,
it averages away noise in individual observations (assuming they are
uncorrelated and unbiased). Second, it models the rate difference between
the local clock and the NTP timescale, which significantly improves the
accuracy even if time has elapsed since the most recent observation.

--Reducing Timestamp Jitter--

The ESP32 network stack is built on a lightweight TCP/IP stack called
"LWIP". It has an API that looks much like the Berkeley socket API, e.g.
sendto() and recvfrom() to send and receive UDP packets. Unfortunately,
LWIP has chosen not to propagate interrupt-time reception timestamps up to
the application as metadata. Early revisions of my NTP client did the same
thing that all the other ESP32 NTP clients do: record the time before
sending a UDP packet, and again when we read the data out of LWIP's buffers
using recv(). This leads to high jitter (on the scale of 10s of ms), as
timestamp acquisition is delayed behind all packet processing, which itself
is subject to the whims of the ESP32's FreeRTOS scheduler.

However, the ESP32 does offer a shortcut: the WiFi driver has "promiscuous
mode" in which user programs can receive a callback directly from the WiFi
driver whenever a packet is received. This is meant for applications such
as WiFi sniffing, but it has the nice side effect of giving me a low-jitter
event to record packet reception times. In fact, the promiscuous mode
callback contains wifi metadata including a reception timestamp acquired
directly *in the wifi hardware*. Using that hardware timestamp would
require some extra work to relate the wifi clock to the system clock, but
could be another source of jitter reduction that I have not implemented.

There seems to be no similar way to get precise send-time timestamps.

--High RTT Filter--

My tests were all against a public Stratum 2 NTP server run by the
University of Washington. From my home in downtown Seattle, the RTT to that
server from a Wifi-connected laptop is about 4ms. However, despite using
promiscuous mode to reduce jitter on the receive timestamps, my client
experienced a long tail of jitter in the RTTs, as seen in the graph below:

https://www.lectrobox.com/projects/esp32-ntp/graphs/calibrated-wifi-interrupts-long.log.rtt_hist.png

Since my laptop does not see this jitter, it is likely coming from some
part of the ESP32's network stack or WiFi hardware. Before using
promiscuous mode for receive-side timestamps, the distribution of errors
was symmetrical; after adding promiscuous mode, the distribution became
more one-sided. This leads me to conclude that the remaining jitter is
primarily variability on the sending side, i.e. ,the time it takes the LWIP
network stack and underlying WiFi hardware to send a packet.

Of course, these sorts of delays contribute directly to error.  As a simple
filter, my client discards observations in the top quartile of RTTs before
feeding the remainder into the linear regression algorithm.

--Latency Asymmetry Correction--

As people familiar with NTP know, its accuracy depends on symmetry in the
forward and reverse path latencies. Only the round-trip-time can be
measured directly, so we infer that the one-way latency is half the RTT.
This assumption is only true if the forward and reverse latencies are
equal, and any asymmetry in those two latencies contributes directly to
error. This more-or-less works on the Internet, and though sometimes packet
queues will increase the latency in one direction or the other, in the
long-term it's often an unbiased estimate.

My early testing indicated that the latency asymmetry on the ESP32 was
biased. That is, when I compared the NTP-acquired timestamp to ground truth
in a long-term test (as described in the next section) I found the errors
were not centered around zero, but were biased in one direction with a
median of about 200us, as seen in the histogram below:

https://www.lectrobox.com/projects/esp32-ntp/graphs/zerocal-wifi-interrupts-2.log.gps_vs_ntp.hist.png

I suspect this is a difference in the time required to send a packet
through the ESP32's stack vs the time required to receive it. Note in the
histogram that the errors are highly asymmetrical; this is because there's
high jitter in the estimate of when we send a packet, but very little in
the reception timestamps. I chose to calibrate this asymmetry out, by
subtracting 200us from the latency estimate as a "hard-coded constant".

--Evaluation--

I evaluated my NTP client by using it to timestamp the PPS output of a
uBlox M10 GNSS. My antenna has a view of about 180 degrees of sky. The
uBlox is configured to listen to the GPS, Galileo and GLONASS
constellations, and during most of this test was using about 20 satellites
for its solution. Once per second, at the top of each UTC second, the GNSS
generates a pulse, which I wired into one of the ESP32's GPIO pins. A GPIO
interrupt handler on the ESP32 gets the current timestamp from the my NTP
service and emits it to the serial port. I record the results and compute
the error in the reported timestamp, e.g., a reported timestamp
of 1644310833.999917 has an error of 83 microseconds. During my tests, the
ESP32 sent an NTP request to the University of Washington's public Stratum
2 NTP server over the internet every 60 seconds. The nominal RTT to that
server from my home via WiFi is about 4ms. I collected over 100,000 data
points.

This graph shows time-series data of the errors over the course of the
experiment:

https://www.lectrobox.com/projects/esp32-ntp/graphs/calibrated-wifi-interrupts-long.log.gps_vs_ntp.timeseries.png

Here is a histogram of the absolute errors recorded, excluding the first
ten minutes of warmup. The median is close to 0, thanks to the calibration
constant I mentioned in the previous section. Also note that the long tail
of errors is one-sided (other than some outliers). As described earlier,
this is because I can acquire low-jitter reception timestamps but can not
do so for send-time timestamps.

https://www.lectrobox.com/projects/esp32-ntp/graphs/calibrated-wifi-interrupts-long.log.gps_vs_ntp.hist.log.png

Numerically, abs(error) can be summarized as:

Median: 216us
95th percentile: 801us
99th percentile: 1,607us
99.9th percentile: 2,461us

--Drawbacks--

My implementation has several drawbacks compared to a real NTP client.
Perhaps foremost is that it makes no effort to create a monotonic
timescale. Every time a new observation arrives, a new linear regression is
performed, and all time queries immediately start using that new model.
This can result in a discontinuity in the timescale, e.g., two sensor
observations collected 10ms apart might have timestamps that differ by
significantly more than 10ms. This is in contrast to the reference NTP
client which gradually slews the clock with the goal of keeping the local
timescale continuous and keeps the instantaneous rate close to SI seconds,
even while adjustments are happening. The ESP32 does support an adjtime()
call so a continuous timescale should be possible.

Another drawback is that my outlier rejection is not very sophisticated,
and least-squares fitting is very sensitive to big outliers. The next step,
if I ever return to this project, would be an iterative algorithm that
draws a linear regression line, looks for outliers, discards them, and
redraws the regression line with the remaining points.

--Summary--

Despite being cheap hardware, and having a slow network stack that suffers
from high jitter, it's possible for an ESP32 to get a view of UTC typically
accurate to under 1 millisecond using only an Internet NTP server accessed
using its built-in WiFi. Though worse than the performance one would expect
from a desktop computer, it's in the same ballpark, and could be improved
by investing more time into better outlier rejection.

Regards,
-Jeremy N3UUO