Searching Other Identifiers

[ moved to ]
Each specimen in Arctos receives a single catalog number, along with any number of identifying numbers, often referred to as "Other IDs." There are several ways, each with their own limitations, to search these numbers. The data available for searching vary wildly based on what collectors have recorded and what collections have entered. Some exploration is often involved in finding a particular set of specimens.

Catalog Numbers

Every specimen has exactly one catalog number. This number is used to form the URL to the specimen, and is the primary number (in conjunction with the collection identifier) which should be used by external resources (such as GenBank) to identify a specimen. See for example, which has catalog number "19268." (See recataloging to change catalog numbers.) Catalog numbers are recorded as integers, although Arctos has the (unused) capability to include strings as prefix and suffix. Catalog numbers may be searched as integers ("1"), ranges ("1-4"), or lists ("1,3,4").

Other IDs

Along with catalog numbers, Arctos provides the capacity to attach any number of identifiers of various types to specimens.

Other Identifiers, like catalog numbers, have three components: A prefix, an integer, and a suffix. Individual collections define how these components should be used, acceptable values, and how data are to be entered, and these decisions affect what sorts of queries are possible. It is often not possible to deduce these rules and practices - contact us if you need help.

To get Other ID search, click More Options on the Identifiers pane of SpecimenSearch.

This will provide options to select Other ID Type and to provide an Other ID Number. (We generally use "number" in the sense of a license plate rather than an integer.) Additionally, you can choose whether the number is an exact match or a "contains" match. Exact match searches are case-sensitive.

It's often unclear what type of ID might have been assigned to a number, and the descriptions currently do little to clarify that problem. It is therefore possible (and often most practical) to search by the number component, entirely ignoring ID Type.

The above example finds all specimens with any type of identifier (except catalog number)
containing the string "123." As of this writing, that search returns 9330 specimens. Additional criteria, coupled with Arctos' sorting capability, is hopefully enough to find the specimen data of interest.

To get all search options, click Customize (near "Show More Options"), select a "My Other Identifier" (which will also then appear in results and on various forms), and choose "Show 3-part ID Search."

Click Close and the form will reload with total of eight search options. For this example, we'll use Collector Number. The simplest use case is to search for a string, here "1234":

This sends the query upper(customIdentifier.Display_Value) LIKE '%1234%' (display_value is a concatenation of prefix, number, and suffix). This returns specimens with Collector Numbers of:
  • ABC-1234-X
  • 1234
  • 1234567
regardless of how the data were entered and are stored. ("ABC-1234-X" could be entered as prefix="ABC-1234-X" or as prefix="ABC-", number="1234", suffix="-X"; "1234" could have been entered as a number or as a prefix.)

Changing the dropdown from "contains" to "is" will, of the above examples, return only "1234."

The "in list" option accepts a comma-separated list of values.

The above example sends SQL upper(customIdentifier.DISPLAY_VALUE) IN ('A','B','C'), and as of this writing returns three specimens:

The in range option works only for enforced-integer types of identifiers (currently only AF and NK). Attempting to use it for collector number will result in a datatype mismatch and return an error.

Three-part search to the rescue! (At least in the cases where data are entered correctly.) All of the above deal with the concatenation of prefix, number, and suffix. It is also possible to search these independently. Search for integer component=1234:

to send SQL customIdentifier.other_id_number = 1234.

This is a numeric match of the numeric part of other IDs. It will not find specimens which have the numeric information entered into prefix. This information is not available to public users, but is evident from the edit form. This specimen will NOT be found with the previous search!

Prefix and suffix work similarly. This search:

sends SQL AND upper(customIdentifier.other_id_prefix) LIKE '%A%' AND customIdentifier.other_id_number = 123 (note prefix is a CONTAINS match and is not case-sensitive) and returns these specimens:

Searching OCR text

Nearly all 180,000 UAM Herbarium Vascular Plants (ALA) specimens have been imaged, but only about half have pre-existing, parsed data. OCR (Optical Character Recognition) processing is well underway, and this is available as single text strings representing all of the text recognized by the OCR program within an image. (Experimentation with parsing this text into standard fields is also underway.) All specimens have at least a crude "folder name," as a taxonomic name within their standard data field.

Particularly useful for the half of the collection for which parsed data is unavailable, you can now locate specimens (and their images) in Arctos by searching the raw OCR results. Many specimens have only taxonomic information, so combining OCR criteria with other criteria (such as geography or collectors) is likely to exclude possible matches. Additionally, there are many uncorrected errors in the raw OCR text, so short queries are more likely to be successful. In other words, a taxonomic criterion plus an OCR criterion is most likely to produce useful records.

From the Arctos SpecimenSearch page (, click "Show More Options" in the Biological Individual pane.

Enter your search criteria in the OCR Text box, and click Search.

All matching specimens will be returned. Click the catalog number to go to Specimen Detail, where you may view the raw OCR text.

Data Loans

[ moved to ]
Data loans document data usage, and are generally used when a project downloads data from Arctos without examining specimens. Data loans form a special relationship between a loan and a cataloged item, rather than a loan and a specimen part. Data loans are not meant as a replacement for "digital" loans, in which a specimen part is imaged (or otherwise digitized), as "digital" loans concern physical objects and handling specimens. Subsequent usage of digital media (including that generated in "digital" loans) may best be recorded as data loans. Curators may wish to create a new loan number series for data loans, although this is not required.

This entry documents creation of a data loan for illustrative purposes.

  1. Found publication vaguely citing Arctos
  2. Created publication agents in Arctos
  3. Since the available PDF was a reprint, used the DOI to look up original publication information (
  4. Created Publication in Arctos
  5. Added Media to the publication
  6. Created Arctos loan of type "data"
  7. Downloaded data loan template
  8. Searched Arctos for scientific names cited in publication
  9. Downloaded results, copied catalog numbers to data loan template.
  10. Filled in rest of values in data loan template, copy/paste to all cells. Save as CSV.
  11. Uploaded to data loan loader, clicked OK a couple times.
  12. Created project, added loan, publication, and media created for publication

Total time: ~10 minutes, mostly spent researching and creating Agents.


The collections used, even though there was no formal loan request and no physical specimen usage, receive quantifiable credit for specimen data used. Future Hieracium added to Arctos will not be included in this loan, so it will be possible to quickly identify specimens which could not have been used, even though the lack of citations in the paper makes it impossible to determine which specimens were actually used. Additionally, if current Hieracium specimens are later determined to be some other species, those data will remain as part of the loan, perhaps explaining yet-undetected anomalies in the publication.

Media Bulkloader

[ ]
This entry describes one process to create Media on Arctos. It is not the only method possible, and may not be most suitable for any given need. MVZ has developed specialized procedures to fit their workflow, for example.

  1. Get your Media - the binary objects - to a suitable web-accessible location. We recommend TACC, and they've provided various tools* to facilitate loading data. You may wish to create a preview of your Media at this time - you'll need to script that, do it manually, or coordinate with whomever is hosting your binary data.
  2. Locate related objects in Arctos. The Media Bulkloader cannot handle all realationships, and how relationships are formed is not always intuitive. See the Media Bulkloader for details.
  3. Populate the Media Bulkloader using the URI you've created in loading media, upload the CSV file, and follow the directions until you get a "spiffy" message.

* TACC tools include direct SCP access, various wrappers and tools using SCP access (such as the ALA Imaging project, which pushes DNG files from a local computer to TACC, creates thumbnails and high-resolution JPGs, and automatically associates images with barcoded specimens), a WEBDAV dropbox (contact us for access), and the ability to automatically push files uploaded to Arctos to TACC.