Searching OCR text

Nearly all 180,000 UAM Herbarium Vascular Plants (ALA) specimens have been imaged, but only about half have pre-existing, parsed data. OCR (Optical Character Recognition) processing is well underway, and this is available as single text strings representing all of the text recognized by the OCR program within an image. (Experimentation with parsing this text into standard fields is also underway.) All specimens have at least a crude "folder name," as a taxonomic name within their standard data field.

Particularly useful for the half of the collection for which parsed data is unavailable, you can now locate specimens (and their images) in Arctos by searching the raw OCR results. Many specimens have only taxonomic information, so combining OCR criteria with other criteria (such as geography or collectors) is likely to exclude possible matches. Additionally, there are many uncorrected errors in the raw OCR text, so short queries are more likely to be successful. In other words, a taxonomic criterion plus an OCR criterion is most likely to produce useful records.

From the Arctos SpecimenSearch page (http://arctos.database.museum/SpecimenSearch.cfm), click "Show More Options" in the Biological Individual pane.


Enter your search criteria in the OCR Text box, and click Search.

All matching specimens will be returned. Click the catalog number to go to Specimen Detail, where you may view the raw OCR text.