Initial notes on scanning

So I’m starting to experiment with my scanning capabilities. I have two all-in-one printers. An old Canon MP530 and a newer Kodak ESP9250. I thought I would just use them both and cut my scanning time in half by swapping back and forth between them, but instead I’ve spent much of a lovely Memorial Day understanding their capabilities, what works, what doesn’t work and figuring out how I’ll actually scan 30 years of Northwest Runner magazine. Here’s what I’ve found.

First, the colors between the scanners is very different. Using the default scanning characteristics – here are some samples from the cover of the December 1998 issue.


Obviously the picture is terrible – I’ll get to that in a minute – but the one on the left is the Canon and the one on the right the Kodak.  I ran a few more tests and the Canon gave me reliably more faithful looking scans of the original image than the Kodak, so I think I’m simply going to not use the Kodak.

Next – yeah, that image is terrible. How do I fix that?  That’s a moire pattern and it commonly happens with scanned images. The secret to fixing it is to set an option in the scanning software from the manufacturer to “descreen” the image and this basically eliminates the interference:

Great!  Now I have pretty acceptable looking scans.  At least I have the basics of what I expect.  I’d taken some other stats before on scan time and file size if images are saved as JPG.  Here they are:

Test Kodak Canon
600dpi scan speed 35s / page 1:07 / page
300dpi scan speed 15.5s / page 18s / page
200dpi scan speed 6s / page 18s / page (again)
150dpi scan speed 5s / page 10.5s / page
600dpi file size not measured 5MB
300dpi file size not measured 1.2MB
200dpi file size not measured 600KB
150dpi file size not measured 300KB

Well that’s discouraging, but maybe not surprising. The Kodak is *dramatically* faster.  Making matters worse – the above measures are for the Canon scanner when the moire interference pattern is *not* suppressed.  With the interference pattern suppressed (which is really the only acceptable way to do this), the scan speed is >1 minute per page every time.

Finally, I wanted to decide on a scan DPI.  With the moire suppression enabled it doesn’t seem like I’m going to sacrifice any time on the project if I choose to go with a lower DPI, so all I need to do is figure out what would be acceptable. For archival purposes it seems like the only reasonable thing would be to go as high as possible but something tells me 600DPI (or higher, I think that I could do 1200) is just not really going to benefit anyone ever and it is would almost definitely make this take up even more of my time (in terms of initial processing and any post-processing) so I am planning on 300DPI or lower.  To make the call on this, I noticed that the Canon software is capable of taking some input files and generating a searchable PDF. I can’t stand PDF as a format but there’s no denying that this would be cool and handy, so I don’t want to choose a scan option with low fidelity if it seems that I might one day sacrifice that ability.  A couple tests on this and it turns out that the generated PDFs I make of 200DPI input files sometimes cannot find input search strings that I enter for people’s names in race results that are very clearly words on the printed page but at 300DPI in a handful of tests I didn’t find any misses.  Therefore: 300DPI it is.

To summarize:

  1. Canon wins vs. my Kodak. Other scanners will probably yield different results.
  2. It is absolutely necessary to turn on the descreen operation to reduce moire interference (and this is only available in the printer’s driver / software, not as a generic TWAIN device, it seems)
  3. Super-high DPI isn’t worth my time. In fact, did I mention how stupid it is that I’m doing this?
  4. But 300DPI seems to be the minimum to be able to make text-searchable PDF files and have that work.

I have a few more things to research before I get going, but I’m well on my way with these findings!

