brightly-ai / README.md
beweinreich's picture
fix for usda
e9b9609
|
raw
history blame
No virus
4.95 kB
---
title: Brightly Ai
emoji: πŸ‘
colorFrom: blue
colorTo: pink
sdk: gradio
python_version: 3.9.6
sdk_version: 4.36.1
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Brightly AI
AI Algorithms to classify words provided by food rescue organizations into a predefined dictionary given by the USDA.
## Overview
The Brightly algorithm classifies items for Food Rescue Organizations (FRO) by leveraging AI from Large Language Models (LLMs). At a high-level, the algorithm ingests CSV files provided by FRO's, iterates through each row, identifying items as food or non-food, performs syntax analysis to break down multi-item words into individual items, and maps the resultant data to the USDA dictionary database.
On a technical level, Brightly AI converts input items into numerical representations using embedded encodings. It employs cosine similarity to compare these encodings with a USDA predefined dictionary, ensuring precise matches. This allows the algorithm to classify items like "Xoconostle" to items like "Prickly pears, raw" with high accuracy, where algorithms like Levenshtein distance would fail.
Additionally, it handles various word forms and multi-term descriptions to maintain high accuracy. For example, only "crunchy peanut butter" would be extracted from "Peanut butter, crunchy, 16 oz." However, "Banana, hot chocolate & chips" would be properly broken down into "Banana", "Hot Chocolate", and "Chips" and categorized accordingly.
## Running
```
docker build -t brightly-ai .
docker run -p 7860:7860 brightly-ai
```
### How It Works
1. Initialization:
- Database Connection: Connects to a database to store and retrieve word mappings.
- Similarity Models: Initializes models to quickly and accurately find similar words.
- Pluralizer: Handles singular and plural forms of words.
2. Processing Input Words:
- Reading Input: The script reads input words, either from a file or a predefined list.
- Handling Multiple Items: If an input contains multiple items (separated by commas or slashes), it splits them and processes each item separately.
3. Mapping Words:
- Fast Similarity Search: Quickly finds the most similar word from the dictionary.
- Slow Similarity Search: If the fast search is inconclusive, it performs a more thorough search.
- Reverse Mapping: Attempts to find similar words by reversing the input word order.
- GPT-3 Query: If all else fails, queries GPT-3 for recommendations.
4. Classifying as Food or Non-Food:
- Classification: Determines if the word is a food item.
- Confidence Score: Assigns a score based on the confidence of the classification.
5. Storing Results:
- Database Storage: Stores the results in the database for future reference.
- CSV Export: Saves the final results to a CSV file for easy access.
# TODO
[ ] Add requirements.txt file
[ ] Add instructions re: each file in repo
## Files and their purpose
Here's a markdown table of the filename, and a brief description of what it does.
| Filename | Description |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| run.py | The main file to run the program. You pass it an array of words, and it'll process each word, store the results to a CSV file in the results folder, and stores any new mappings in the sqlite database |
| algo_fast.py | Uses a fast version of our LLM to encode word embeddings, and use cosine similarity to determine if they are similar. |
| algo_slow.py | A similar version of the algorithm, however, it has more a larger amount of embeddings from the dictionary. |
| multi_food_item_detector.py | Determines if the given string of text is multiple food items, or a single food item. |
| update_pickle.py | Updates the dictionary pickle file with any new words that have been added to the dictionary/additions.csv file. |
| add_mappings_to_embeddings.py | This takes all the reviewed mappings in the mappings database, and adds them to the embeddings file. |