File size: 4,065 Bytes
8a3b507
 
 
 
 
 
be02bbe
8a3b507
 
 
 
 
 
 
9189e38
 
 
 
 
 
 
 
cc901e9
 
 
 
 
 
 
9189e38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
title: Brightly Ai
emoji: 👁
colorFrom: blue
colorTo: pink
sdk: gradio
python_version: 3.9.6
sdk_version: 4.36.1
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Brightly AI

AI Algorithms to classify words provided by food rescue organizations into a predefined dictionary given by the USDA.

## Overview

This script processes a list of input words, classifies them as food or non-food items, and finds the most similar words from a predefined dictionary. It uses various techniques, including fast and slow similarity searches, GPT-3 queries, and a custom pluralizer.

## Running

```
docker build -t brightly-ai .
docker run -p 7860:7860 brightly-ai
```

### How It Works

1. Initialization:

- Database Connection: Connects to a database to store and retrieve word mappings.
- Similarity Models: Initializes models to quickly and accurately find similar words.
- Pluralizer: Handles singular and plural forms of words.

2. Processing Input Words:

- Reading Input: The script reads input words, either from a file or a predefined list.
- Handling Multiple Items: If an input contains multiple items (separated by commas or slashes), it splits them and processes each item separately.

3. Mapping Words:

- Fast Similarity Search: Quickly finds the most similar word from the dictionary.
- Slow Similarity Search: If the fast search is inconclusive, it performs a more thorough search.
- Reverse Mapping: Attempts to find similar words by reversing the input word order.
- GPT-3 Query: If all else fails, queries GPT-3 for recommendations.

4. Classifying as Food or Non-Food:

- Classification: Determines if the word is a food item.
- Confidence Score: Assigns a score based on the confidence of the classification.

5. Storing Results:

- Database Storage: Stores the results in the database for future reference.
- CSV Export: Saves the final results to a CSV file for easy access.

# TODO

[ ] Add requirements.txt file
[ ] Add instructions re: each file in repo

## Files and their purpose

Here's a markdown table of the filename, and a brief description of what it does.

| Filename                      | Description                                                                                                                                                                                             |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| run.py                        | The main file to run the program. You pass it an array of words, and it'll process each word, store the results to a CSV file in the results folder, and stores any new mappings in the sqlite database |
| algo_fast.py                  | Uses a fast version of our LLM to encode word embeddings, and use cosine similarity to determine if they are similar.                                                                                   |
| algo_slow.py                  | A similar version of the algorithm, however, it has more a larger amount of embeddings from the dictionary.                                                                                             |
| multi_food_item_detector.py   | Determines if the given string of text is multiple food items, or a single food item.                                                                                                                   |
| update_pickle.py              | Updates the dictionary pickle file with any new words that have been added to the dictionary/additions.csv file.                                                                                        |
| add_mappings_to_embeddings.py | This takes all the reviewed mappings in the mappings database, and adds them to the embeddings file.                                                                                                    |