[time-nuts] GPS Outage
Magnus Danielson
magnus at rubidium.dyndns.org
Fri Feb 26 23:19:21 UTC 2016
Hal,
On 02/26/2016 09:39 PM, Hal Murray wrote:
>
> martin.burnicki at burnicki.net said:
>>> Strange that at least 3 independant firmware trees/development teams should
>>> chose the same magic wk860.
>
>> I don't find it strange. If the next firmware version is based on the
>> previous version and none of the developers has stumbled across this
>> potential problem earlier ...
>
> That sounds like poor software engineering. Or poor engineering management.
It's easy to say, but as work progresses over the years, it is hard to
revisit all aspects of the code and re-evaluate it. One has to adapt
humbleness to the task, try to check as much as possible, but still
accept that you can't find all the bugs. Often the need to finish new
products on time comes before, flushing out all the new bugs.
> The wk860 is supposed to represent the build time of the software so it will
> work for 20 years from when it was built rather than 20 years from when the
> 10 bit week counter last rolled over or 20 years from when the constant was
> last updated.
Indeed. Just not quite the full 20 years.
> That magic constant has to be pulled out to a module where it is visible
> rather than buried deep in some large module. Then the recipe for releasing
> software has to update it, either by having a step in the checklist where the
> human does the edit or by running a script that does it. (Yes, you have to
> start by having a formal procedure for releasing software/firmware.)
That assumes one can foresee this to become a problem. Most doesn't
consider it to be a recent problem. However, just updating the constant
won't work for the type of products we have here, as the boxes we now
see have problems was never updated. A battery-backed RTC clock would
have helped, but the battery would probably fail around now. EEPROM
updates in the boxes would have helped, but 20 years back EEPROMS
wheren't too happy about many updates.
I think it is better to realize that the solution was good enough for
the expected lifetime of the product.
I've made some design choices like this. Some of them have survived the
complete re-writes and still not greatly failed the assumptions. Some
have faired less well. Some of the design decisions I make can easily
live for 20 years, and the one thing I've learned is that it is hard to
balance longterm with practical design for the expected lifetime.
I think they did fairly well. For most systems, the remaining problem is
showing the time as being exactly 1024 weeks off, but all other aspects
working correctly. We can apply prior knowledge to correct the 1024
weeks offset and keep these receivers running way beyond their designed
lifetime. That's actually quite respectable in my book.
Best Regards,
Magnus
More information about the Time-nuts_lists.febo.com
mailing list