metadata

title: Brightly Ai
emoji: 👁
colorFrom: blue
colorTo: pink
sdk: gradio
python_version: 3.9.6
sdk_version: 4.36.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Brightly AI

AI Algorithms to classify words provided by food rescue organizations into a predefined dictionary given by the USDA.

Overview

This script processes a list of input words, classifies them as food or non-food items, and finds the most similar words from a predefined dictionary. It uses various techniques, including fast and slow similarity searches, GPT-3 queries, and a custom pluralizer.

Running

docker build -t brightly-ai .
docker run -p 7860:7860 brightly-ai

How It Works

Initialization:

Database Connection: Connects to a database to store and retrieve word mappings.
Similarity Models: Initializes models to quickly and accurately find similar words.
Pluralizer: Handles singular and plural forms of words.

Processing Input Words:

Reading Input: The script reads input words, either from a file or a predefined list.
Handling Multiple Items: If an input contains multiple items (separated by commas or slashes), it splits them and processes each item separately.

Mapping Words:

Fast Similarity Search: Quickly finds the most similar word from the dictionary.
Slow Similarity Search: If the fast search is inconclusive, it performs a more thorough search.
Reverse Mapping: Attempts to find similar words by reversing the input word order.
GPT-3 Query: If all else fails, queries GPT-3 for recommendations.

Classifying as Food or Non-Food:

Classification: Determines if the word is a food item.
Confidence Score: Assigns a score based on the confidence of the classification.

Storing Results:

Database Storage: Stores the results in the database for future reference.
CSV Export: Saves the final results to a CSV file for easy access.

TODO

[ ] Add requirements.txt file [ ] Add instructions re: each file in repo

Files and their purpose

Here's a markdown table of the filename, and a brief description of what it does.

Filename	Description
run.py	The main file to run the program. You pass it an array of words, and it'll process each word, store the results to a CSV file in the results folder, and stores any new mappings in the sqlite database
algo_fast.py	Uses a fast version of our LLM to encode word embeddings, and use cosine similarity to determine if they are similar.
algo_slow.py	A similar version of the algorithm, however, it has more a larger amount of embeddings from the dictionary.
multi_food_item_detector.py	Determines if the given string of text is multiple food items, or a single food item.
update_pickle.py	Updates the dictionary pickle file with any new words that have been added to the dictionary/additions.csv file.
add_mappings_to_embeddings.py	This takes all the reviewed mappings in the mappings database, and adds them to the embeddings file.