Possibly final notes on magazine scanning

I spent some time this past weekend experimenting with scanning settings and eventually got one full year of Northwest Runner magazine scanned. I chose 2003 to scan because I think this is a year for which digital copies exist – this meant if something went horribly wrong and I physically ruined an issue or two, I probably wouldn’t get my kneecaps bashed in at an upcoming Winter Grand Prix race series. I had already done extensive experimenting with my own copies of the magazine anyway, so that wasn’t too likely, but I wanted to play it safe.

Here are the key notes:

  1. Scanning in greyscsale images+text at 300DPI is dramatically faster than scanning in color.  However for recent issues this isn’t a great option. Some pages are B&W but many are full color.
  2. Output image size for greyscale vs. color at 300DPI is pretty similar.  The only reason this might matter is because of the behavior I previously mentioned where the only way I can do this with our scanners at work is to have it email the output from the scanner to me – I can’t scan directly to a network share – and my mailserver rejects messages above a certain size.
  3. My mailserver seems to reject messages when they cross a threshold somewhere between ~12-15MB in size.  In practice, this means I can scan about 5 ledger-sized, double-sided sheets, or about 10 pages of the magazine at a time.
  4. It is important to separate the pages and invert the fold along the spine before sending through the auto-feeder.  I didn’t do this with one of the first magazines and I wound up with some paper jams, some slightly mangled pages (not really destroyed or anything, but like what you get with a printer auto-feed after something’s gotten jammed).  I mentioned I scanned the entire year of 2003 – the jams only happened in the first issue or two.  After I started this separating and fold inverting process, the pages did not get “stuck” along the spine and they all fed cleanly.
  5. Sometimes 5 sheets barely hits the “too big” threshold for scanning. If this happens, I need to do something like “scan 3, then scan 2.”  This is rare, but it happens.
  6. Some of the magazines are missing pages or have single pages torn out.  This screws up pagination and might make later post-processing / assembly into PDFs a pain (or I might just ignore it).
  7. The printers at work require me to log in and after some time they will log me out.  If I’m logged out, I need to re-enter the scan settings (2 sided, color images + text, scan as JPG not PDF, 300DPI).  This is tedious.  If I stay attentive during the scan process, I can: feed 5 sheets, wait for it to scan, put the next 5 sheets in the feed reader, wait for confirmation that it sent the email, then press “Scan” again, I won’t get logged out.  This also ensures that the scanner (which is critical path in this assembly line) is always “busy.”
  8. As this process is happening, I’m getting email after email with 10 attached images (scan01.jpg, scan02.jpg, etc. for both sides) that I need to pull out of my inbox and archive in folders.  Because the image names conflict (scan01.jpg will be the cover and also page 11 and page 21, etc.) I need to batch these up, too.  My post-processing jpg rotater, cutter, etc. script will handle these.

That’s about it.  To scan the 2003 year of magazines took almost exactly 2 hours. During this time I am constantly busy with: de-stapling issues, preparing 5 page batches for the scanner (de-“sticking” the spine), running the scanner, adding/removing sheets from the feed tray, processing my inbox (which will fill up if I don’t pull the files out), reassembling scanned magazines, and trying to re-staple.  I think I can make this a little more efficient and bet I’ll trim a decent amount of time off that 2 hour baseline, but this process seems pretty close to optimized to do this job well and keep the original issues intact.

Now I just need to sync up with Martin (or really probably Bill Roe, who I think actually owns these issues) and confirm that they’re OK with me plowing ahead with all the back issues.

Leave a Comment