Searching Other Identifiers

[ moved to https://arctosdb.wordpress.com/documentation/catalog/#srch2 ]
Each specimen in Arctos receives a single catalog number, along with any number of identifying numbers, often referred to as "Other IDs." There are several ways, each with their own limitations, to search these numbers. The data available for searching vary wildly based on what collectors have recorded and what collections have entered. Some exploration is often involved in finding a particular set of specimens.

Catalog Numbers

Every specimen has exactly one catalog number. This number is used to form the URL to the specimen, and is the primary number (in conjunction with the collection identifier) which should be used by external resources (such as GenBank) to identify a specimen. See for example http://arctos.database.museum/guid/UAM:Mamm:19268, which has catalog number "19268." (See recataloging to change catalog numbers.) Catalog numbers are recorded as integers, although Arctos has the (unused) capability to include strings as prefix and suffix. Catalog numbers may be searched as integers ("1"), ranges ("1-4"), or lists ("1,3,4").

Other IDs

Along with catalog numbers, Arctos provides the capacity to attach any number of identifiers of various types to specimens.

Other Identifiers, like catalog numbers, have three components: A prefix, an integer, and a suffix. Individual collections define how these components should be used, acceptable values, and how data are to be entered, and these decisions affect what sorts of queries are possible. It is often not possible to deduce these rules and practices - contact us if you need help.

To get Other ID search, click More Options on the Identifiers pane of SpecimenSearch.


This will provide options to select Other ID Type and to provide an Other ID Number. (We generally use "number" in the sense of a license plate rather than an integer.) Additionally, you can choose whether the number is an exact match or a "contains" match. Exact match searches are case-sensitive.

It's often unclear what type of ID might have been assigned to a number, and the descriptions currently do little to clarify that problem. It is therefore possible (and often most practical) to search by the number component, entirely ignoring ID Type.


The above example finds all specimens with any type of identifier (except catalog number)
containing the string "123." As of this writing, that search returns 9330 specimens. Additional criteria, coupled with Arctos' sorting capability, is hopefully enough to find the specimen data of interest.

To get all search options, click Customize (near "Show More Options"), select a "My Other Identifier" (which will also then appear in results and on various forms), and choose "Show 3-part ID Search."


Click Close and the form will reload with total of eight search options. For this example, we'll use Collector Number. The simplest use case is to search for a string, here "1234":


This sends the query upper(customIdentifier.Display_Value) LIKE '%1234%' (display_value is a concatenation of prefix, number, and suffix). This returns specimens with Collector Numbers of:
  • ABC-1234-X
  • 1234
  • 1234567
regardless of how the data were entered and are stored. ("ABC-1234-X" could be entered as prefix="ABC-1234-X" or as prefix="ABC-", number="1234", suffix="-X"; "1234" could have been entered as a number or as a prefix.)

Changing the dropdown from "contains" to "is" will, of the above examples, return only "1234."

The "in list" option accepts a comma-separated list of values.


The above example sends SQL upper(customIdentifier.DISPLAY_VALUE) IN ('A','B','C'), and as of this writing returns three specimens:


The in range option works only for enforced-integer types of identifiers (currently only AF and NK). Attempting to use it for collector number will result in a datatype mismatch and return an error.

Three-part search to the rescue! (At least in the cases where data are entered correctly.) All of the above deal with the concatenation of prefix, number, and suffix. It is also possible to search these independently. Search for integer component=1234:


to send SQL customIdentifier.other_id_number = 1234.

This is a numeric match of the numeric part of other IDs. It will not find specimens which have the numeric information entered into prefix. This information is not available to public users, but is evident from the edit form. This specimen will NOT be found with the previous search!


Prefix and suffix work similarly. This search:


sends SQL AND upper(customIdentifier.other_id_prefix) LIKE '%A%' AND customIdentifier.other_id_number = 123 (note prefix is a CONTAINS match and is not case-sensitive) and returns these specimens:

Searching OCR text

Nearly all 180,000 UAM Herbarium Vascular Plants (ALA) specimens have been imaged, but only about half have pre-existing, parsed data. OCR (Optical Character Recognition) processing is well underway, and this is available as single text strings representing all of the text recognized by the OCR program within an image. (Experimentation with parsing this text into standard fields is also underway.) All specimens have at least a crude "folder name," as a taxonomic name within their standard data field.

Particularly useful for the half of the collection for which parsed data is unavailable, you can now locate specimens (and their images) in Arctos by searching the raw OCR results. Many specimens have only taxonomic information, so combining OCR criteria with other criteria (such as geography or collectors) is likely to exclude possible matches. Additionally, there are many uncorrected errors in the raw OCR text, so short queries are more likely to be successful. In other words, a taxonomic criterion plus an OCR criterion is most likely to produce useful records.

From the Arctos SpecimenSearch page (http://arctos.database.museum/SpecimenSearch.cfm), click "Show More Options" in the Biological Individual pane.


Enter your search criteria in the OCR Text box, and click Search.

All matching specimens will be returned. Click the catalog number to go to Specimen Detail, where you may view the raw OCR text.

Data Loans

[ moved to https://arctosdb.wordpress.com/documentation/loans/#dataloan ]
Data loans document data usage, and are generally used when a project downloads data from Arctos without examining specimens. Data loans form a special relationship between a loan and a cataloged item, rather than a loan and a specimen part. Data loans are not meant as a replacement for "digital" loans, in which a specimen part is imaged (or otherwise digitized), as "digital" loans concern physical objects and handling specimens. Subsequent usage of digital media (including that generated in "digital" loans) may best be recorded as data loans. Curators may wish to create a new loan number series for data loans, although this is not required.

This entry documents creation of a data loan for illustrative purposes.

  1. Found publication vaguely citing Arctos
  2. Created publication agents in Arctos
  3. Since the available PDF was a reprint, used the DOI to look up original publication information (http://www.google.com/search?q=DOI%3A+10.1111%2Fj.1472-4642.2008.00547.x)
  4. Created Publication in Arctos
  5. Added Media to the publication
  6. Created Arctos loan of type "data"
  7. Downloaded data loan template
  8. Searched Arctos for scientific names cited in publication
  9. Downloaded results, copied catalog numbers to data loan template.
  10. Filled in rest of values in data loan template, copy/paste to all cells. Save as CSV.
  11. Uploaded to data loan loader, clicked OK a couple times.
  12. Created project, added loan, publication, and media created for publication

Total time: ~10 minutes, mostly spent researching and creating Agents.

Result: http://arctos.database.museum/project/different-climatic-envelopes-among-invasive-populations-may-lead-to-underestimations-of-current-and-future-biological-invasions

The collections used, even though there was no formal loan request and no physical specimen usage, receive quantifiable credit for specimen data used. Future Hieracium added to Arctos will not be included in this loan, so it will be possible to quickly identify specimens which could not have been used, even though the lack of citations in the paper makes it impossible to determine which specimens were actually used. Additionally, if current Hieracium specimens are later determined to be some other species, those data will remain as part of the loan, perhaps explaining yet-undetected anomalies in the publication.

Media Bulkloader

[ https://arctosdb.wordpress.com/documentation/media/#bulk ]
This entry describes one process to create Media on Arctos. It is not the only method possible, and may not be most suitable for any given need. MVZ has developed specialized procedures to fit their workflow, for example.

  1. Get your Media - the binary objects - to a suitable web-accessible location. We recommend TACC, and they've provided various tools* to facilitate loading data. You may wish to create a preview of your Media at this time - you'll need to script that, do it manually, or coordinate with whomever is hosting your binary data.
  2. Locate related objects in Arctos. The Media Bulkloader cannot handle all realationships, and how relationships are formed is not always intuitive. See the Media Bulkloader for details.
  3. Populate the Media Bulkloader using the URI you've created in loading media, upload the CSV file, and follow the directions until you get a "spiffy" message.

* TACC tools include direct SCP access, various wrappers and tools using SCP access (such as the ALA Imaging project, which pushes DNG files from a local computer to TACC, creates thumbnails and high-resolution JPGs, and automatically associates images with barcoded specimens), a WEBDAV dropbox (contact us for access), and the ability to automatically push files uploaded to Arctos to TACC.

Customizing Arctos headers with CSS

* IMPORTANT *

You will want to try this out in TEST before moving to production!

You must coordinate loading all images with the development team, or host them on your own site.
* IMPORTANT *

To customize the header for your collection, you may simply supply the required values under Manage Collections.

To customize beyond the defaults, you'll need to create a CSS file, coordinate loading it with the developers, and select it under STYLESHEET on the Manage Collection form.

Example: Use a dark background image for the header, and change the header links to be white for contrast.
  1. Create a transparent-background logo approximately 100px in height. Load to Arctos (you'll need developer assistance), and select the image in HEADER_IMAGE. Set HEADER_COLOR to "white" or it will override your image with gray.
  2. Create a background image of height=1px less than the logo you created in (1).
  3. Create a CSS file. (// signifies a comment below. This is not valid CSS comment syntax!)
    // header_color is the ID of Arctos header
    #header_color {
        // this is the background image
        background:url("/images/DMNSHeaderBg2.png") repeat-x scroll 0 0 #1C3664;
    }
    // change the color for the large text
    .headerInstitutionText, .headerCollectionText{
        color:white;
    }
    // and the links at the top-right
    #headerLinks, #headerLinks a {
        color:white;
        size:small;
    }
    // add hover behavior to differentiate the links from text
    #headerLinks a:hover {
        color:red;
    }

Recataloging

[ https://arctosdb.wordpress.com/documentation/catalog/#recatalog ]
It is sometimes necessary to move cataloged items from one collection or catalog number to another. When doing so, it is important to maintain a way of finding the specimen by it's original catalog number.

We recommend adding to the new cataloged item an explicit other_id_type, such as

  • UAM: University of Alaska Museum
and a complete GUID as other_id_number, such as
  • UAM:Mamm:12
Additionally, insert into table REDIRECT old and new paths, and Arctos will automatically redirect traffic. (From Arctos, Manage Data/Tools/Redirects).

For example, if DGR Mamm 123 is recataloged as MSB Mamm 456, enter:

old_path=/guid/DGR:Mamm:123; new_path=/guid/MSB:Mamm:456

This will cause http://arctos.database.museum/guid/DGR:Mamm:123 to redirect to http://arctos.database.museum/guid/MSB:Mamm:456, thereby maintaining any external links to DGR:Mamm:123 in Arctos (e.g., from GenBank).

    Suspect Data

    I'd like to monitor what's going on with my collection. How can I do that?

    Loan and Permit reminders are emailed to the listed contacts at 1 year, 6 months, 1 month, and upon the permit expiring or the loan becoming due. Any agent with an email address may act as a contact.

    The following are available under the Management/Misc tab:

    • Publication Staging is a way to quickly capture very basic information about possible Citations in Arctos. Curators should periodically review and update the information in this form.
    • Management/Misc/Sync parent/child taxonomy finds specimens in a Parent Of/Child Of relationship which do not share current Identification, and provides a means to synchronize them.
    • Merge Dup Agents provides a means to reconcile agents in relationship "bad duplicate of".
    • Pending Relationships provides access to those cataloged item relationships created during Data Entry which have not been successfully formalized. Curatorial action is required for anything in this list.
    The following are available under the Reports tab:
    • GenBank MIA is a script that periodically crawls GenBank looking for sequences that may be related to Arctos specimens and which do not already have an Arctos LinkOut (newly-linked may continue to appear in the table for a few days).
      • Query type specimen_voucher:collection are GenBank sequences with which the collector has submitted a properly-formatted link to GenBank, and are almost certainly accurate.
      • Other query types are various guesses, and may or may not accurately resolve to specimens.
    • Annotations are user-submitted annotations. The data quality contact (which may be set under Manage Collections) should also receive notification when annotations are filed.
    • Loan/Citation Stats summarizes loans by the borrower, loan status, number of citations against specimens which were loaned, and cited versus current Identification. Citations must be individually examined to determine from which loan they originated. Citation Counts may reflect repeated usage of individual specimens.
    • Audit SQL contains all UPDATE, DELETE, or INSERT statements, including those which were subsequently rolled back (e.g., due to constraint violations).
    • Oracle Roles is a summary of users by assigned roles. Curators should monitor this, particularly the collection roles (e.g., UAM_MAMM) to ensure that only authorized persons have access to their collections.
    • Funky Data/Suspect Data has several sub-options:
      • Publications without Authors should always find nothing.
      • Publications without Citations may find publications which do not properly cite specimens. This usage may still be reflected in a Project.
      • Projects with Loans and without Publications may require followup with the borrower. Note that Publications may now include things like class reports, dissertations, and brochures.
      • Loans without Specimens are probably legacy or incomplete loans and may not be effectively used to demonstrate collection usage.
    • Funky Data/Partless Specimens finds specimens which have zero specimen parts. Only observations should be in this list; all other cataloged items should have at least one part, even if that part has a disposition of "missing" or "used up."
    • Funky Data/Messy Taxonomy allows searching taxonomy by many criteria to identify various problems.
    • Funky Data/Catalog Number Gaps finds unused catalog numbers. There are no technical problems with nonsequential catalog numbers; this is purely an informational form.