[time-nuts] Archiving Timing Data

Tijd Dingen tijddingen at yahoo.com
Mon Jan 10 22:58:07 UTC 2011


You saved me a lot of typing. :) Comments inline...

--- On Mon, 1/10/11, Bob Bownes <bownes at gmail.com> wrote:

> From: Bob Bownes <bownes at gmail.com>
> Subject: Re: [time-nuts] Archiving Timing Data
> To: scmcgrath at gmail.com, "Discussion of precise time and frequency measurement" <time-nuts at febo.com>
> Date: Monday, January 10, 2011, 10:08 PM
> There is a difference between
> archival format and database format. If you
> are looking for an archival format that is portable, then a
> CSV (or other
> delimiter of your choice) is ideal. They are easy to import
> to a real
> database and compress well. If, on the other hand, you are
> looking for a
> working database, you're better of putting it into some
> kind of schema in a
> Real Database(TM) of your choice and then tuning for
> transactional or data
> warehouse performance.

Indeed. I was thinking that at the very least you ALWAYS want the meta-data of your timeseries in a database, index that so you can easily query it to find your timeseries of interest. With meta-data I mean things like general description, instruments used, date, length of time series, useful statistical measures of the overall timeseries, stuff like that. You can query this to find the timeseries of choice, and then retrieve.

Retrieval would depend a lot on your usecases. Do you just find the right measurements, and feed that to your file based utility? Do you want to perform more complicated database queries? Do you /neeeeed/ a webbased frontend?

IMO the 2 most obvious solutions for something like this would be either A) return the URI with the file location (csv with delimiter du jour), or B) store the full timeseries in the database as well and use your API of choice to retrieve the data.

Again, the choice depends on the type of operation and frequency of operation you want to perform on your data.

> In this case, a simple one or two table schema and indexes
> on the things you
> want to sort on, should take care of most of the storage
> problem. Once you
> have that, use the API for the DB of choice to
> store/retrieve the data.
> 
> MySQL is free and runs on pretty much everything nowadays.
> That plus
> myphpadmin would make it easy enough for most of those
> bright enough to
> understand the content of this list to come up with a
> schema.

Agreed. I noticed RDD being recommended. Don't do it. Why not? Because. RDD is pretty good for what it was made to do. I have plenty of installs around together with mrtg or rddtool, but I would not recommend using it for something like this.

As for postgresql, personally I would use postgresql due to some personal preferences. That said, for regular desktop users without this bias I would recommend using mysql. Things like myadmin and phpmyadmin make life easy... Plus, should one decide to make the obligatory webbased frontend, mysql is slightly friendlier for the php novice. Not to put everyone in the "beginning user" category, but more to keep the gate open to as many people as possible as it were.


> Bob (whose day job is with a big red database company)

Fred (whose day job involved staring in disbelief at poorly conceived schema's)


> 
> 
> On Mon, Jan 10, 2011 at 4:57 PM, <scmcgrath at gmail.com>
> wrote:
> 
> > The counter argument is with a heavyweight database -
> the size of the
> > datastore increases dramatically and there is no
> guarantee that the tool
> > will be around in 10 years to read the data.
> >
> > All SQL databases use ASCII format CSV to load and
> dump the data from their
> > internal data representation.
> >
> > Transactional systems still use a hierarchical
> database 'think IBM IMS or
> > RAIMA' to store and access large datasets like CC
> auth.   These databases
> > are one step away from ASCII or EBCDIC
> >
> > Scott
> > Sent from my Verizon Wireless BlackBerry
> >
> > -----Original Message-----
> > From: Chris Albertson <albertson.chris at gmail.com>
> > Sender: time-nuts-bounces at febo.com
> > Date: Mon, 10 Jan 2011 12:42:03
> > To: Discussion of precise time and frequency
> measurement<
> > time-nuts at febo.com>
> > Reply-To: Discussion of precise time and frequency
> measurement
> >        <time-nuts at febo.com>
> > Subject: Re: [time-nuts] Archiving Timing Data
> >
> > We have mountains of data here too.  The best why
> to store it is in a
> > "real" database of some kind.  There are several
> that are free, open
> > source and multi-platform.  The best for this use
> is "Postgres".   As
> > this is free and open source there is no reason not to
> use it.
> >
> > In the past I've kept snapshots for simulations that
> have run for
> > hours/days/weeks and we got many hundreds of millions
> of data points.
> >  Then we are able to query for almost any
> conditions and expression,
> > for example "Give me a A, B where A-B less than 4 from
> July 5th 1998"
> >
> > I can tell you first hand that having a billion lines
> of tab separated
> > data is worse than useless.  You need itcataloged
> such that you can
> > very quickly (seconds) find useful subsets of the data
> and you can
> > NEVER know in advance what subset you might need.
> >
> >
> >
> >
> > On Mon, Jan 10, 2011 at 12:22 PM, Peter Vince <pvince at theiet.org>
> wrote:
> > > Would a TSB (Tab Separated Value) format be
> preferable?  Full-stops
> > > and commas are used in numbers as decimal and
> thousands separators (or
> > > vice versa), so using tab character would avoid
> any problems with
> > > commas in the actual data (and make it is a bit
> easier to quickly
> > > eyeball when viewed in a text editor).
> > >
> > > Peter  (G8ZZR, London, England)
> > >
> > >
> > > On 9 January 2011 17:15, Bob Camp <lists at rtty.us>
> wrote:
> > > ...
> > >> I doubt very much I'm the only one taking a
> mountain of timing data and
> > not properly cataloging it. My guess is that maybe
> > 90% of the list members
> > are in the same boat. How about:
> > >>
> > >> 1) A set of not to restrictive data format
> standards (CSV with a few
> > restrictions ...)
> > > ...
> > >
> > >_______________________________________________
> > > time-nuts mailing list -- time-nuts at febo.com
> > > To unsubscribe, go to
> > https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> > > and follow the instructions there.
> > >
> >
> >
> >
> > --
> > =====
> > Chris Albertson
> > Redondo Beach, California
> >
> > _______________________________________________
> > time-nuts mailing list -- time-nuts at febo.com
> > To unsubscribe, go to
> > https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> > and follow the instructions there.
> > _______________________________________________
> > time-nuts mailing list -- time-nuts at febo.com
> > To unsubscribe, go to
> > https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> > and follow the instructions there.
> >
> _______________________________________________
> time-nuts mailing list -- time-nuts at febo.com
> To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> and follow the instructions there.
> 


      




More information about the Time-nuts_lists.febo.com mailing list