[time-nuts] Re: PPS latency? User vs kernel mode

Mon Dec 13 08:48:22 UTC 2021

I found the plot I made earlier (Aug 2017), which I've attached.  The
link I sent in 2017 and in the list archive still works, but I'm not
allowed to post links to google photos anymore.

The uncorrected value would be comparable to pps-gpio, without a
hardware timer.  Worst case is 100 Âµs.  This was an Altera Cyclone V,
dual core Cortex-A9 at about 700 MHz.  Current RPi generation is
considerably faster, but CPU speed is not really the issue here.

IIRC, pps-gpio creates the timestamp in hard irq context.  This means
it will preempt any tasks, both those running in user mode and in
kernel mode.  It should only be delayed, software wise, by another
hard irq handler or when local irqs have been disabled on that core
for a spinlock.  The code in pps-gpio should be implemented so that
there is no contention for shared software resources, such as a global
lock to read the kernel clock.

There are also hardware based sources of error, such as contention on
the AXI bus to access main memory (but not the local CPU cache).  But
these should be much smaller than 500 Âµs.

So your culprit is a driver that spends a long time in hard irq
context or holds a spinlock for a long time.  The worst culprits are
invariably GPU drivers.

> I recall, I tried also to limit timing relevant tasks to a separate CPU core
> (this helped a bit, but not really enough), and I am not completely sure
> whether fiddling around with the timer IRQ registers of the interrupt
> controller really disabled all timer IRQs on that core completely, I just
> remember that the documentation on how to write the correct dts-files for
> doing so was close to non-existent, so I might have done that incorrectly...

I have yet to see steering interrupt to a particular core work on ARM.
But it has been a while since I last tried.  You must ensure your IRQ
is delivered to one set of core(s), while any ill-behaving IRQ is not.
And must also ensure any calls from userspace into kernel mode (ioctl,
read, write, and so on) that will be handled by an ill-behaved driver
are also not run on those cores.  So it is not just the steering of
hardware IRQs one must do, but also CPU affinity sets for userspace
tasks and kernel threads.

I suggest looking at LTTng to trace the kernel.  You can generate a
timeline of every irq, interrupt masking, system call, and scheduler
decision.  You can then search for a trace event where the pps-gpio
phase offset was > 100 Âµs and see which CPU handled the hard irq and
then exactly what that CPU was doing that led up to it taking so long
to get around to making that timestamp.

> Without a hardware time stamper for inputs and a programmable hardware timer
> for outputs, I would not trust the RPi for timing purposes.

With the GPU disabled, it should be good enough for many use cases.
Certainly, if you are writing something in Python, it will not be a
limiting factor!  But of course a hardware timestamper is orders of
magnitude better.  It is a shame that hardware seems so rare in SoCs.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppshist.png
Type: image/png
Size: 36977 bytes
Desc: not available
URL: <http://febo.com/pipermail/time-nuts_lists.febo.com/attachments/20211213/11b20393/attachment.png>