Archiving Northwest Runner

A couple weeks ago I got this idea that seemed great at the time. “Northwest Runner is a really valuable resource for runners in Seattle and I am positive that there is a ton of great history in there that should be preserved and made more publicly available. I should scan all the back issues.

It’s that last part where this may have taken a turn for the worse.  Anyway, I got in touch with long-time editor and publisher, Martin Rudow, and today I picked up a trunk full of back issues. I’m going to take some notes on this process and archive them in my blog for posterity and as I come up with questions that might be interesting for runners or hobby archivists.  This will probably start with a background of what data is available, go into technical questions / notes / challenges / discoveries, and hopefully just be kind of interesting.

I’ll try to remember to tag all the posts as “nwrunner” so that interested readers don’t have to wade through my extensive and deeply crazed rants on the current state of technology.  Wait…that’s the unabomber…not me…

2 Comments »

  1. John Wallace, III said,

    May 27, 2012 @ 4:46 pm

    I want to do this with issues of another newsletter I’ve subscribed to – USRSA’s Streak Registry. Are they going to be JPGs or PDFs? Are you going to try OCRing them? At what date do you stop archiving them because people will stop subscribing and just look for the digital version. Do you have to worry about photos that were copyrighted for the print versions of NW Runner and might not be copyrighted to be distributed online? Let me know what you come up with along the technical/notes/challenges/discoveries section!

  2. admin said,

    June 4, 2012 @ 4:19 pm

    Hey John – sorry for missing this post! I’m a terrible blog-maintainer.

    You can see my further discoveries in the followup posts but my plan had always been:
    1) scan to some image format (JPG is probably adequate for my purposes)
    2) allow any kind of post-processing (e.g. text-searchable PDF or something else).

    I think I mentioned in a followup post that the software that came with my Canon printer includes generation of text-searchable PDF directly from the things it scans or from input files. The success of those results was the same: if I scanned at 300DPI or better or used a 300DPI or better input file, the OCR/text searching appeared to work great. If I scanned at a lower fidelity, OCR was not successful and I found misses.

    One thing I still don’t know how to do is generate a really useful archival format or searchability. Adding text-search to a single PDF is fine, but it’s not ideal if there are hundreds of issues and each needs to be searched. This, again, is a reason I want a general purpose format for the initial scan – I think it’s harder to post-process a PDF bundle of a bunch of images than just having a bunch of images.

    Anyway – thanks for the feedback. I spent 2 hours this weekend scanning all issues from 2003 of NWR. I haven’t post processed them but I have a lot of tips for my own process, etc. that I want to write in a new post soon.

RSS feed for comments on this post · TrackBack URI

Leave a Comment