[time-nuts] GPS Outage

Magnus Danielson magnus at rubidium.dyndns.org
Fri Feb 26 23:19:21 UTC 2016


Hal,

On 02/26/2016 09:39 PM, Hal Murray wrote:
>
> martin.burnicki at burnicki.net said:
>>> Strange that at least 3 independant firmware trees/development teams should
>>> chose the same magic wk860.
>
>> I don't find it strange. If the next firmware version is based on the
>> previous version and none of the developers has stumbled across this
>> potential problem earlier ...
>
> That sounds like poor software engineering.  Or poor engineering management.

It's easy to say, but as work progresses over the years, it is hard to 
revisit all aspects of the code and re-evaluate it. One has to adapt 
humbleness to the task, try to check as much as possible, but still 
accept that you can't find all the bugs. Often the need to finish new 
products on time comes before, flushing out all the new bugs.

> The wk860 is supposed to represent the build time of the software so it will
> work for 20 years from when it was built rather than 20 years from when the
> 10 bit week counter last rolled over or 20 years from when the constant was
> last updated.

Indeed. Just not quite the full 20 years.

> That magic constant has to be pulled out to a module where it is visible
> rather than buried deep in some large module.  Then the recipe for releasing
> software has to update it, either by having a step in the checklist where the
> human does the edit or by running a script that does it.  (Yes, you have to
> start by having a formal procedure for releasing software/firmware.)

That assumes one can foresee this to become a problem. Most doesn't 
consider it to be a recent problem. However, just updating the constant 
won't work for the type of products we have here, as the boxes we now 
see have problems was never updated. A battery-backed RTC clock would 
have helped, but the battery would probably fail around now. EEPROM 
updates in the boxes would have helped, but 20 years back EEPROMS 
wheren't too happy about many updates.

I think it is better to realize that the solution was good enough for 
the expected lifetime of the product.

I've made some design choices like this. Some of them have survived the 
complete re-writes and still not greatly failed the assumptions. Some 
have faired less well. Some of the design decisions I make can easily 
live for 20 years, and the one thing I've learned is that it is hard to 
balance longterm with practical design for the expected lifetime.

I think they did fairly well. For most systems, the remaining problem is 
showing the time as being exactly 1024 weeks off, but all other aspects 
working correctly. We can apply prior knowledge to correct the 1024 
weeks offset and keep these receivers running way beyond their designed 
lifetime. That's actually quite respectable in my book.

Best Regards,
Magnus



More information about the Time-nuts_lists.febo.com mailing list