LLM: gpt instructions: '1. Refactor the unstructured OCR text into a dictionary based on the JSON structure outlined below. 2. You should map the unstructured OCR text to the appropriate JSON key and then populate the field based on its rules. 3. Some JSON key fields are permitted to remain empty if the corresponding information is not found in the unstructured OCR text. 4. Ignore any information in the OCR text that doesn''t fit into the defined JSON structure. 5. Duplicate dictionary fields are not allowed. 6. Ensure that all JSON keys are in lowercase. 7. Ensure that new JSON field values follow sentence case capitalization. 8. Ensure all key-value pairs in the JSON dictionary strictly adhere to the format and data types specified in the template. 9. Ensure the output JSON string is valid JSON format. It should not have trailing commas or unquoted keys. 10. Only return a JSON dictionary represented as a string. You should not explain your answer.' json_formatting_instructions: "The next section of instructions outlines how to format\ \ the JSON dictionary. The keys are the same as those of the final formatted JSON\ \ object.\nFor each key there is a format requirement that specifies how to transcribe\ \ the information for that key. \nThe possible formatting options are:\n1. \"verbatim\ \ transcription\" - field is populated with verbatim text from the unformatted OCR.\n\ 2. \"spell check transcription\" - field is populated with spelling corrected text\ \ from the unformatted OCR.\n3. \"boolean yes no\" - field is populated with only\ \ yes or no.\n4. \"boolean 1 0\" - field is populated with only 1 or 0.\n5. \"integer\"\ \ - field is populated with only an integer.\n6. \"[list]\" - field is populated\ \ from one of the values in the list.\n7. \"yyyy-mm-dd\" - field is populated with\ \ a date in the format year-month-day.\nThe desired null value is also given. Populate\ \ the field with the null value of the information for that key is not present in\ \ the unformatted OCR text." mapping: COLLECTING: - collectors - collector_number - determined_by - multiple_names - verbatim_date - date - end_date GEOGRAPHY: - country - state - county - min_elevation - max_elevation - elevation_units LOCALITY: - locality_name - verbatim_coordinates - decimal_coordinates - datum - plant_description - cultivated - habitat MISCELLANEOUS: [] TAXONOMY: - catalog_number - genus - species - subspecies - variety - forma rules: Dictionary: catalog_number: description: The barcode identifier, typically a number with at least 6 digits, but fewer than 30 digits. format: verbatim transcription null_value: '' collector_number: description: Unique identifier or number that denotes the specific collecting event and associated with the collector. format: verbatim transcription null_value: s.n. collectors: description: Full name(s) of the individual(s) responsible for collecting the specimen. When multiple collectors are involved, their names should be separated by commas. format: verbatim transcription null_value: not present country: description: Country that corresponds to the current geographic location of collection. Capitalize first letter of each word. If abbreviation is given populate field with the full spelling of the country's name. format: spell check transcription null_value: '' county: description: Administrative division 2 that corresponds to the current geographic location of collection; capitalize first letter of each word. Administrative division 2 is equivalent to a U.S. county, parish, borough. format: spell check transcription null_value: '' cultivated: description: Cultivated plants are intentionally grown by humans. In text descriptions, look for planting dates, garden locations, ornamental, cultivar names, garden, or farm to indicate cultivated plant. format: boolean yes no null_value: '' date: description: 'Date the specimen was collected formatted as year-month-day. If specific components of the date are unknown, they should be replaced with zeros. Examples: ''0000-00-00'' if the entire date is unknown, ''YYYY-00-00'' if only the year is known, and ''YYYY-MM-00'' if year and month are known but day is not.' format: yyyy-mm-dd null_value: '' datum: description: Datum of location coordinates. Possible values are include in the format list. Leave field blank if unclear. [WGS84, WGS72, WGS66, WGS60, NAD83, NAD27, OSGB36, ETRS89, ED50, GDA94, JGD2011, Tokyo97, KGD2002, TWD67, TWD97, BJS54, XAS80, GCJ-02, BD-09, PZ-90.11, GTRF, CGCS2000, ITRF88, ITRF89, ITRF90, ITRF91, ITRF92, ITRF93, ITRF94, ITRF96, ITRF97, ITRF2000, ITRF2005, ITRF2008, ITRF2014, Hong Kong Principal Datum, SAD69] format: '[list]' null_value: '' decimal_coordinates: description: Correct and convert the verbatim location coordinates to conform with the decimal degrees GPS coordinate format. format: spell check transcription null_value: '' determined_by: description: Full name of the individual responsible for determining the taxanomic name of the specimen. Sometimes the name will be near to the characters 'det' to denote determination. This name may be isolated from other names in the unformatted OCR text. format: verbatim transcription null_value: '' elevation_units: description: 'Elevation units must be meters. If min_elevation field is populated, then elevation_units: ''m''. Otherwise elevation_units: ''''.' format: spell check transcription null_value: '' end_date: description: 'If a date range is provided, this represents the later or ending date of the collection period, formatted as year-month-day. If specific components of the date are unknown, they should be replaced with zeros. Examples: ''0000-00-00'' if the entire end date is unknown, ''YYYY-00-00'' if only the year of the end date is known, and ''YYYY-MM-00'' if year and month of the end date are known but the day is not.' format: yyyy-mm-dd null_value: '' forma: description: Taxonomic determination to form (f.). format: verbatim transcription null_value: '' genus: description: Taxonomic determination to genus. Genus must be capitalized. If genus is not present use the taxonomic family name followed by the word 'indet'. format: verbatim transcription null_value: '' habitat: description: Description of a plant's habitat or the location where the specimen was collected. Ignore descriptions of the plant itself. format: verbatim transcription null_value: '' locality_name: description: Description of geographic location, landscape, landmarks, regional features, nearby places, or any contextual information aiding in pinpointing the exact origin or site of the specimen. format: verbatim transcription null_value: '' max_elevation: description: Maximum elevation or altitude in meters. If only one elevation is present, then max_elevation should be set to the null_value. Only if units are explicit then convert from feet ('ft' or 'ft.' or 'feet') to meters ('m' or 'm.' or 'meters'). Round to integer. format: integer null_value: '' min_elevation: description: Minimum elevation or altitude in meters. Only if units are explicit then convert from feet ('ft' or 'ft.' or 'feet') to meters ('m' or 'm.' or 'meters'). Round to integer. format: integer null_value: '' multiple_names: description: Indicate whether multiple people or collector names are present in the unformatted OCR text. If you see more than one person's name the value is 'yes'; otherwise the value is 'no'. format: boolean yes no null_value: '' plant_description: description: Description of plant features such as leaf shape, size, color, stem texture, height, flower structure, scent, fruit or seed characteristics, root system type, overall growth habit and form, any notable aroma or secretions, presence of hairs or bristles, and any other distinguishing morphological or physiological characteristics. format: verbatim transcription null_value: '' species: description: Taxonomic determination to species, do not capitalize species. format: verbatim transcription null_value: '' state: description: Administrative division 1 that corresponds to the current geographic location of collection. Capitalize first letter of each word. Administrative division 1 is equivalent to a U.S. State. format: spell check transcription null_value: '' subspecies: description: Taxonomic determination to subspecies (subsp.). format: verbatim transcription null_value: '' variety: description: Taxonomic determination to variety (var). format: verbatim transcription null_value: '' verbatim_coordinates: description: Verbatim location coordinates as they appear on the label. Do not convert formats. Possible coordinate types are one of [Lat, Long, UTM, TRS]. format: verbatim transcription null_value: '' verbatim_date: description: Date of collection exactly as it appears on the label. Do not change the format or correct typos. format: verbatim transcription null_value: s.d. SpeciesName: taxonomy: - Genus_species