brightly-ai / README.md
beweinreich's picture
update version of python
be02bbe
|
raw
history blame
No virus
4.07 kB
---
title: Brightly Ai
emoji: πŸ‘
colorFrom: blue
colorTo: pink
sdk: gradio
python_version: 3.9.6
sdk_version: 4.36.1
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Brightly AI
AI Algorithms to classify words provided by food rescue organizations into a predefined dictionary given by the USDA.
## Overview
This script processes a list of input words, classifies them as food or non-food items, and finds the most similar words from a predefined dictionary. It uses various techniques, including fast and slow similarity searches, GPT-3 queries, and a custom pluralizer.
## Running
```
docker build -t brightly-ai .
docker run -p 7860:7860 brightly-ai
```
### How It Works
1. Initialization:
- Database Connection: Connects to a database to store and retrieve word mappings.
- Similarity Models: Initializes models to quickly and accurately find similar words.
- Pluralizer: Handles singular and plural forms of words.
2. Processing Input Words:
- Reading Input: The script reads input words, either from a file or a predefined list.
- Handling Multiple Items: If an input contains multiple items (separated by commas or slashes), it splits them and processes each item separately.
3. Mapping Words:
- Fast Similarity Search: Quickly finds the most similar word from the dictionary.
- Slow Similarity Search: If the fast search is inconclusive, it performs a more thorough search.
- Reverse Mapping: Attempts to find similar words by reversing the input word order.
- GPT-3 Query: If all else fails, queries GPT-3 for recommendations.
4. Classifying as Food or Non-Food:
- Classification: Determines if the word is a food item.
- Confidence Score: Assigns a score based on the confidence of the classification.
5. Storing Results:
- Database Storage: Stores the results in the database for future reference.
- CSV Export: Saves the final results to a CSV file for easy access.
# TODO
[ ] Add requirements.txt file
[ ] Add instructions re: each file in repo
## Files and their purpose
Here's a markdown table of the filename, and a brief description of what it does.
| Filename | Description |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| run.py | The main file to run the program. You pass it an array of words, and it'll process each word, store the results to a CSV file in the results folder, and stores any new mappings in the sqlite database |
| algo_fast.py | Uses a fast version of our LLM to encode word embeddings, and use cosine similarity to determine if they are similar. |
| algo_slow.py | A similar version of the algorithm, however, it has more a larger amount of embeddings from the dictionary. |
| multi_food_item_detector.py | Determines if the given string of text is multiple food items, or a single food item. |
| update_pickle.py | Updates the dictionary pickle file with any new words that have been added to the dictionary/additions.csv file. |
| add_mappings_to_embeddings.py | This takes all the reviewed mappings in the mappings database, and adds them to the embeddings file. |