[time-nuts] Marrisons 1948 article

Sat Aug 6 15:53:01 UTC 2011

John since I had actual books I downloaded the items of interest for reading
on trips. If thats even possible these days.
But far more interesting is if you have the repository complete and are
working on indexing thats really great. They are good reads, if you want to
understand how we got from there to here.
Things like open wire phone systems etc.
Sorry to say the last open wires still existing in Boston are coming down.
It was along I 90 and the rail line.

The poles and cross arms were beginning to snap and crumble. They were
cutting them down 3 weeks ago.
Regards
Paul.
WB8TSL

On Sat, Aug 6, 2011 at 9:07 AM, John Miles <jmiles at pop.net> wrote:

> > On that subject, what do you use for that?
> >
> > Personally I do something like this:
> > - pdftohtml
> > - index the html pages with mnogosearch
> > - dump on server
> > - the pdf's are now searchable through a web interface (and from command
> > line obviously)
> >
> > This works fine for pdf's that have embedded text, but obviously no go
> for
> > OCR.
> >
> > So basically the question is, know of any good open source ocr software
> for
> > the job?
> > In the absence of better options I'll probably give tesseract-ocr a spin,
> and
> > see if it's any good for this.
>
> I've been using a commercial package (http://pdftransformer.abbyy.com/ )
> and
> have been really happy with it in general.  It's slow, but does a good job
> on even marginally readable text.  I don't think I've ever needed to use it
> in batch mode, but I believe there's a way to make it happen, and that will
> be necessary since every article is in its own .PDF file.
>
> The wget process copies the HTML index pages as well as the .PDFs and fixes
> up the links to point to the local copies, so that part is pretty easy to
> deal with.  For my own copy of the archive, I'll probably merge all of
> those
> index pages into one document so that all of the article titles can be
> browsed on a single page.
>
> -- john, KE5FX
>
>
>
> _______________________________________________
> time-nuts mailing list -- time-nuts at febo.com
> To unsubscribe, go to
> https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> and follow the instructions there.
>