Spaces:
Paused
Paused
title: Brightly Ai | |
emoji: π | |
colorFrom: blue | |
colorTo: pink | |
sdk: gradio | |
python_version: 3.9.6 | |
sdk_version: 4.36.1 | |
app_file: app.py | |
pinned: false | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# Brightly AI | |
AI Algorithms to classify words provided by food rescue organizations into a predefined dictionary given by the USDA. | |
## Overview | |
This script processes a list of input words, classifies them as food or non-food items, and finds the most similar words from a predefined dictionary. It uses various techniques, including fast and slow similarity searches, GPT-3 queries, and a custom pluralizer. | |
## Running | |
``` | |
docker build -t brightly-ai . | |
docker run -p 7860:7860 brightly-ai | |
``` | |
### How It Works | |
1. Initialization: | |
- Database Connection: Connects to a database to store and retrieve word mappings. | |
- Similarity Models: Initializes models to quickly and accurately find similar words. | |
- Pluralizer: Handles singular and plural forms of words. | |
2. Processing Input Words: | |
- Reading Input: The script reads input words, either from a file or a predefined list. | |
- Handling Multiple Items: If an input contains multiple items (separated by commas or slashes), it splits them and processes each item separately. | |
3. Mapping Words: | |
- Fast Similarity Search: Quickly finds the most similar word from the dictionary. | |
- Slow Similarity Search: If the fast search is inconclusive, it performs a more thorough search. | |
- Reverse Mapping: Attempts to find similar words by reversing the input word order. | |
- GPT-3 Query: If all else fails, queries GPT-3 for recommendations. | |
4. Classifying as Food or Non-Food: | |
- Classification: Determines if the word is a food item. | |
- Confidence Score: Assigns a score based on the confidence of the classification. | |
5. Storing Results: | |
- Database Storage: Stores the results in the database for future reference. | |
- CSV Export: Saves the final results to a CSV file for easy access. | |
# TODO | |
[ ] Add requirements.txt file | |
[ ] Add instructions re: each file in repo | |
## Files and their purpose | |
Here's a markdown table of the filename, and a brief description of what it does. | |
| Filename | Description | | |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | |
| run.py | The main file to run the program. You pass it an array of words, and it'll process each word, store the results to a CSV file in the results folder, and stores any new mappings in the sqlite database | | |
| algo_fast.py | Uses a fast version of our LLM to encode word embeddings, and use cosine similarity to determine if they are similar. | | |
| algo_slow.py | A similar version of the algorithm, however, it has more a larger amount of embeddings from the dictionary. | | |
| multi_food_item_detector.py | Determines if the given string of text is multiple food items, or a single food item. | | |
| update_pickle.py | Updates the dictionary pickle file with any new words that have been added to the dictionary/additions.csv file. | | |
| add_mappings_to_embeddings.py | This takes all the reviewed mappings in the mappings database, and adds them to the embeddings file. | | |