Keeping Track of OCR Oddities

Leaving things only in your mind is the best way to forget them.

Searches of digital newspapers require a certain degree of creativity. “c” can be misread as “e,”h” can be read as a “b,” “t” can be read as an “l,” etc. The list of variants is a long one, but some are more likely than others. The newspaper’s original print quality, whether the originals or microfilm copies were used to generate images, and other image factors can create additional character recognition issues as well.

Keep a list of the main OCR variants you encounter for names that you are working on. Sites that allow wildcard searches will make it easier to find some of these variants, but not all sites allow searches to be conducted in this fashion. Remember when thinking about variants spellings in OCR search results that there are spelling variants or errors that were originally printed in the paper and that the OCR errors in transcription are on top of those errors.

Some variant readings are more likely than others. But keeping a list will help. Otherwise I might forget to look for frautvetter, troutfeller and other renderings while looking for trautvetter.

Share