13 December 2007

Variants for OCR searching

I've been using the Historical Newspapers at GenealogyBank in an attempt to learn more about Philip Troutfetter, who was involved in some interesting financial activity in Colorado around the turn of the twentieth century. I love to do soundex and wildcard searches when possible, but GenealogyBank does not allow Soundex searches (however wildcard searches are possible at GenealogyBank).

I find it best to make a list of variant spellings of the name before beginning any search.

Here's a few:

Trautvetter
Trautfetter
Troutfetter
Troutvetter
Trantvetter
Trantfetter
Troutfelter
Trautvelter

There are MORE.

It is important to remember that when printed materials are digitized, letters can easily be misread. For that reason, Trautvelter is a reasonable variant as is Trantvetter. Small "e" can also be misread as a "c." Searching records that have been digitized and indexed with OCR requires thinking about how letters can be misinterpreted if part of the image is difficult to read.

No comments:

Post a Comment