[time-nuts] Raspberry Pi NTP server

Thu Jul 9 20:22:22 UTC 2020

On Thu, Jul 9, 2020 at 11:33 AM jimlux <jimlux at earthlink.net> wrote:
>
> I suspect the 1ms is not limited by the chip (after all, they all have
> to support 8kHz schedules for isochronous audio, even if the serial port
> doesn't use it).

The USB polling is implemented in the host controller.  So one is
limited based on how often it will generate a start-of-frame.  The
CPU/OS can not simply ask it to generate transfers on demand at
whatever rate is possible.  For example, on the UHCI a SOF is
controlled by a counter using a 12 MHz clock that counts 11936 to
12063 ticks.  So one can generate frames at most every 995 µs.

> I suspect it's more an artifact of how Linux (or whatever) OS deals with
> the interrupt handling.  Linux isn't designed as a "real time fast
> response" operating system. It depends on devices doing big transfers,
> so a 1 ms response time is fine.

Linux can respond to interrupts in much less than 1 millisecond.  I
measured this on a device where I wanted to create a decently accurate
PPS.  Using a hardware GPIO interrupt the latency is around 3 µs from
what I measured.

> That is, you set up a DMA transfer to a disk drive at 100 Mbps, but
> since you're transferring 100 kByte buffers, you only need to service
> the event 125 times/second.

DMA size is largely about overhead.  There is a significant cost to
setup a DMA transfer.  Creating a mapping to insure the memory to be
transfered is in the IO space of both ends of the DMA, managing the
various levels of CPU cache which are not always DMA coherent,
generating the sg list to program the dma hardware, etc.  Small
buffers hurt efficiency.  It's still possible to get high throughput
with small DMA buffers, just at the cost of efficiency.

Also keep in mind, that to achieve high utilization of a channel, it's
critical to avoid dead time between transfers.  This is done by
queuing multiple operations, so that the hardware can move to next as
soon as one is finished.  We would probably not queue one 100 kB DMA
buffer.  Rather we would queue multiple 16 kB buffers, waking before
the queue is empty to add more buffers and finalize those that have
finished.

> You'll easily see this on high speed serial links through USB if you do
> "character at a time" operations. You cannot get 50kbps with character
> at a time with buffer flushes between characters.

On Linux there is a termio buffer between userspace and the serial
UART.  If you write bytes one at a time from userspace, it's very
inefficient due to the system call overhead per byte,  But one is only
limited by that overhead vs the CPU speed.  There is no inherent poll
rate that limits this.  Data from the termio buffer in the kernel will
be efficiently moved into the uart FIFO, using something like a low
watermark level that triggers an interrupt, or raises an internal
signal that prompts a DMA controller to act, depending on the UART
hardware.

Flushing between characters adds more overhead.  Depending on
hardware, this might also wait for the UART to finish sending the data
in the FIFO and generate an interrupt, but this is not always
implemented correctly for every UART driver.

I'm not sure how the FTDI serial works here, as I've never tried to do
precise serial IO over one (because it's a bad idea!).  I imagine that
only becomes limited by not being able to complete an interrupt
transfer and then send a character, based on the result of the
interrupt, in the same USB frame.

> Since you're not going to be transferring (batches of) bytes any more
> frequently than 1 millisecond, there's not much point in sending the
> "modem control" signals (RTS/CTS) through faster.   Any high speed
> protocol handler has to account for the fact that if RTS/CTS handshaking
> is implemented, you can't overrun the transmit FIFO - That is, if the
> far end drops CTS, the near end doesn't send, and bytes pile up in the
> FIFO. So you just need to tell the device driver to stop sending soon
> enough that the FIFO doesn't overflow.  If the FIFO is, e.g., 16 deep,

On Linux, RTS/CTS is almost always implemented in hardware.  The UART
will automatically stop shifting out data when CTS is dropped.  It
might also generate a delta CTS irq.  While I have dealt with UARTs
that had "software-based hardware flow control", it doesn't work well
on Linux and is rare.  It's difficult to respond fast enough
consistently.  And usually when I've needed RTS/CTS, it's for
something fast, 1 megabaud or higher, where there is even less time to
react.