Spaces:

chuanenlin
/

foodnet

Running

App Files Files Community

David Chuan-En Lin commited on Nov 29, 2021

Commit

a618fc8

1 Parent(s): c9bf118

init

Browse files

Files changed (4) hide show

.DS_Store +0 -0
README.md +118 -25
foodnet.py +290 -0
requirements.txt +10 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

README.md CHANGED Viewed

@@ -1,37 +1,130 @@
 ---
-title: Foodnet
-emoji: 😻
-colorFrom: blue
-colorTo: green
 sdk: streamlit
-app_file: app.py
-pinned: false
 ---
-# Configuration
-`title`: _string_
-Display title for the Space
-`emoji`: _string_
-Space emoji (emoji-only character allowed)
-`colorFrom`: _string_
-Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
-`colorTo`: _string_
-Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
-`sdk`: _string_
-Can be either `gradio` or `streamlit`
-`sdk_version` : _string_
-Only applicable for `streamlit` SDK.
-See [doc](https://hf.co/docs/hub/spaces) for more info on supported versions.
-`app_file`: _string_
-Path to your main application file (which contains either `gradio` or `streamlit` Python code).
-Path is relative to the root of the repository.
-`pinned`: _boolean_
-Whether the Space stays on top of your list.

 ---
+title: FoodNet
+emoji: 🍔
+colorFrom: purple
+colorTo: purple
 sdk: streamlit
+app_file: foodnet.py
 ---
+# 24-679 FoodNet Project
+## Authors
+David Chuan-En Lin: chuanenl@cs.cmu.edu
+Mitch Fogelson: mfogelso@andrew.cmu.edu
+Sunny Yang: yundiy@andrew.cmu.edu
+Shihao Xu: shihaoxu@andrew.cmu.edu
+## TODO
+### Must Have
+1. Cooking method (How to do this?) (TBD)
+2. Ingredients -> Recipe (Recipe Querey?) (Mitch)
+3. Cuisine Meta Data (Where to get) (TBD)
+4. Deployment on the cloud -> (David)
+### Like to have
+1. Images related ->
+  * [Google Image Search API](https://pypi.org/project/Google-Images-Search/)
+  * [OpenAI Clip](https://openai.com/api/)
+2. User Studies
+### Moonshot
+1. Recipe Masking Prediction
+2.
+## Description
+We wanted to help students and households in the Pittsburgh to reduce their food waste. We developed a model that suggests recipes based on current leftovers availible.
+* Model -> Facebook's [FastText](https://radimrehurek.com/gensim/models/fasttext.html)
+* Dataset -> [Simplified 1M+ Recipes](https://github.com/schmidtdominik/RecipeNet)
+    * [Dominick Schmidt Blog](https://dominikschmidt.xyz/simplified-recipes-1M/#dataset-sources)
+## Try WebApp
+https://huggingface.co/spaces/chuanenlin/foodnet
+## Quick Start
+1. Clone repository
+```
+git clone git@github.com:chuanenlin/foodnet.git
+```
+2. Move into repository
+```
+cd foodnet
+```
+(**Optional** Create conda environment)
+3. Install gdown
+```
+pip install gdown
+```
+4. Download models
+```
+gdown https://drive.google.com/drive/folders/1LlQpd45E71dSfC8FgvIhJjQjqxnlBC9j -O ./models --folder
+```
+5. Download datasets (Optional)
+```
+gdown https://drive.google.com/drive/folders/18aA3BFKqzkqNz5L4N5vN6bFnp8Ch2CQV -O ./data --folder
+```
+6. Install Dependencies
+```
+pip install -r requirements.txt
+```
+7. Run code
+```
+streamlit run foodnet.py
+```
+## Args
+Train new model
+```
+streamlit run foodnet.py -d/--dataset ['/PATH/TO/DATASET'] -t/--train True
+```
+Load alternative model
+```
+streamlit run foodnet.py --model ['/PATH/TO/MODEL']
+```
+## Requirements
+* python>=3.6
+* gensim>=4.0.x
+* streamlit
+* gdown
+* nltk
+* pickle
+* matplotlib
+## References
+TODO

foodnet.py ADDED Viewed

	@@ -0,0 +1,290 @@

+import requests
+from io import BytesIO
+import numpy as np
+from gensim.models.fasttext import FastText
+from scipy import spatial
+import itertools
+import gdown
+import warnings
+import nltk
+# warnings.filterwarnings('ignore')
+import pickle
+import pdb
+from concurrent.futures import ProcessPoolExecutor
+import matplotlib.pyplot as plt
+import streamlit as st
+import argparse
+# NLTK Datasets
+nltk.download('wordnet')
+nltk.download('punkt')
+nltk.download('averaged_perceptron_tagger')
+# Average embedding → Compare
+def recommend_ingredients(yum, leftovers, n=10):
+  '''
+  Uses a mean aggregation method
+  :params
+  yum -> FastText Word2Vec Obj
+  leftovers -> list of str
+  n -> int top_n to return
+  :returns
+  output -> top_n recommendations
+  '''
+  leftovers_embedding_sum = np.zeros([100,])
+  for ingredient in leftovers:
+    # pdb.set_trace()
+    ingredient_embedding = yum.get_vector(ingredient, norm=True)
+    leftovers_embedding_sum += ingredient_embedding
+  leftovers_embedding = leftovers_embedding_sum / len(leftovers) # Embedding for leftovers
+  top_matches = yum.similar_by_vector(leftovers_embedding, topn=100)
+  top_matches = [(x[0].replace('_',' '), x[1]) for x in top_matches]
+  output = [x for x in top_matches if not any(ignore in x[0] for ignore in leftovers)] # Remove boring same item matches, e.g. "romaine lettuce" if leftovers already contain "lettuce".
+  return output[:n]
+# Compare → Find intersection
+def recommend_ingredients_intersect(yum, leftovers, n=10):
+  '''
+  Finds top combined probabilities
+  :params
+  yum -> FastText Word2Vec Obj
+  leftovers -> list of str
+  n -> int top_n to return
+  :returns
+  output -> top_n recommendations
+  '''
+  first = True
+  for ingredient in leftovers:
+    ingredient_embedding = yum.get_vector(ingredient, norm=True)
+    ingredient_matches = yum.similar_by_vector(ingredient_embedding, topn=10000)
+    ingredient_matches = [(x[0].replace('_',' '), x[1]) for x in ingredient_matches]
+    ingredient_output = [x for x in ingredient_matches if not any(ignore in x[0] for ignore in leftovers)] # Remove boring same item matches, e.g. "romaine lettuce" if leftovers already contain "lettuce".
+    if first:
+      output = ingredient_output
+      first = False
+    else:
+      output = [x for x in output for y in ingredient_output if x[0] == y[0]]
+  return output[:n]
+def recommend_ingredients_subsets(model, yum,leftovers, subset_size):
+  '''
+  Returns all subsets from each ingredient
+  :params
+  model -> FastText Obj
+  yum -> FastText Word2Vec Obj
+  leftovers -> list of str
+  n -> int top_n to return
+  :returns
+  output -> top_n recommendations
+  '''
+  all_outputs = {}
+  for leftovers_subset in itertools.combinations(leftovers, subset_size):
+    leftovers_embedding_sum = np.empty([100,])
+    for ingredient in leftovers_subset:
+      ingredient_embedding = yum.word_vec(ingredient, use_norm=True)
+      leftovers_embedding_sum += ingredient_embedding
+    leftovers_embedding = leftovers_embedding_sum / len(leftovers_subset) # Embedding for leftovers
+    top_matches = model.similar_by_vector(leftovers_embedding, topn=100)
+    top_matches = [(x[0].replace('_',' '), x[1]) for x in top_matches]
+    output = [x for x in top_matches if not any(ignore in x[0] for ignore in leftovers_subset)] # Remove boring same item matches, e.g. "romaine lettuce" if leftovers already contain "lettuce".
+    all_outputs[leftovers_subset] = output[:10]
+  return all_outputs
+def filter_adjectives(data):
+    '''
+    Remove adjectives that are not associated with a food item
+    :params
+    data
+    :returns
+    data
+    '''
+    recipe_ingredients_token = [nltk.word_tokenize(x) for x in data]
+    inds = []
+    for i, r in enumerate(recipe_ingredients_token):
+        out = nltk.pos_tag(r)
+        out = [x[1] for x in out]
+        if len(out) > 1:
+            inds.append(int(i))
+        elif 'NN' in out or 'NNS' in out:
+            inds.append(int(i))
+    return [data[i] for i in inds]
+def plural_to_singular(lemma, recipe):
+  '''
+  :params
+  lemma -> nltk lemma Obj
+  recipe -> list of str
+  :returns
+  recipe -> converted recipe
+  '''
+  return [lemma.lemmatize(r) for r in recipe]
+def filter_lemma(data):
+    '''
+    Convert plural to roots
+    :params
+    data -> list of lists
+    :returns
+    data -> returns filtered data
+    '''
+    # Initialize Lemmatizer (to reduce plurals to stems)
+    lemma = nltk.wordnet.WordNetLemmatizer()
+    # NOTE: This uses all the computational resources of your computer
+    with ProcessPoolExecutor() as executor:
+        out = list(executor.map(plural_to_singular, itertools.repeat(lemma), data))
+    return out
+def train_model(data):
+    '''
+    Train fastfood text
+    NOTE: gensim==4.1.2
+    :params
+    data -> list of lists of all recipes
+    save -> bool
+    :returns
+    model -> FastFood model obj
+    '''
+    model = FastText(data, vector_size=32, window=99, min_count=5, workers=40, sg=1) # Train model
+    return model
+@st.cache(allow_output_mutation=True)
+def load_model(filename='models/fastfood_orig_4.model'):
+  '''
+  Load the FastText Model
+  :params:
+  filename -> path to the model
+  :returns
+  model -> this is the full FastText obj
+  yum -> this is the FastText Word2Vec obj
+  '''
+  # Load Models
+  model = FastText.load(filename)
+  yum = model.wv
+  return model, yum
+@st.cache(allow_output_mutation=True)
+def load_data(filename='data/all_recipes_ingredients_lemma.pkl'):
+  '''
+  Load data
+  :params:
+  filename -> path to dataset
+  :return
+  data -> list of all recipes
+  '''
+  return pickle.load(open(filename,'rb'))
+def plot_results(names, probs, n=5):
+  '''
+  Plots a bar chart of the names of the items vs. probability of similarity
+  :params:
+  names -> list of str
+  probs -> list of float values
+  n -> int of how many bars to show NOTE: Max = 100
+  :return
+  fig -> return figure for plotting
+  '''
+  plt.bar(range(len(names)), probs, align='center')
+  ax = plt.gca()
+  ax.xaxis.set_major_locator(plt.FixedLocator(range(len(names))))
+  ax.xaxis.set_major_formatter(plt.FixedFormatter(names))
+  ax.set_ylabel('Probability',fontsize='large', fontweight='bold')
+  ax.set_xlabel('Ingredients', fontsize='large', fontweight='bold')
+  ax.xaxis.labelpad = 10
+  ax.set_title(f'FastFood Top {n} Predictions for Leftovers = {st.session_state.leftovers}')
+  # mpld3.show()
+  fig = plt.gcf()
+  return fig
+if __name__ == "__main__":
+    # Initialize argparse
+    # parser = argparse.ArgumentParser()
+    # Defaults
+    # data_path = 'data/all_recipes_ingredients_lemma.pkl'
+    # model_path = 'models/fastfood_lemma_4.model'
+    # Arguments
+    # parser.add_argument('-d', '--dataset',        default=data_path, type=str,   help="the filepath of the dataset")
+    # parser.add_argument('-t', '--train',          default=False, type=bool,   help="the filepath of the dataset")
+    # parser.add_argument('-m', '--model',          default=model_path, type=str,   help="the filepath of the dataset")
+    # args = parser.parse_args()
+    # print(args)
+    ## Train or Test ##
+    # if args.train:
+    #   # Load Dataset
+    #   data = load_data(args.dataset) #pickle.load(open(args.dataset, 'rb'))
+    #   # model = train_model(data)
+    #   # model_path = input("Model filename and directory [eg. models/new_model.model]:   ")
+    #   # model.save(model_path)
+    # else:
+    model, yum = load_model('fastfood.model')
+    ##### UI/UX #####
+    ## Sidebar ##
+    add_selectbox = st.sidebar.selectbox(
+    "Food Utilization App",
+    ("FastFood Recommendation Model", "Food Donation Resources", "Contact Team")
+    )
+    ## Selection Tool ##
+    st.multiselect("Select leftovers", list(yum.key_to_index.keys()), default=['bread', 'lettuce'], key="leftovers")
+    ## Slider ##
+    st.slider("Number of Recommendations", min_value=1, max_value=100, value=5, step=1, key='top_n')
+    ## Get food recommendation ##
+    out = recommend_ingredients(yum, st.session_state.leftovers, n=st.session_state.top_n)
+    names = [o[0] for o in out]
+    probs = [o[1] for o in out]
+    st.checkbox(label="Show model score", value=False, key="probs")
+    if st.session_state.probs:
+      st.table(data=out)
+    else:
+      st.table(data=names)
+    ## Plot Results ##
+    st.checkbox(label="Show model bar chart", value=False, key="plot")
+    if st.session_state.plot:
+      fig = plot_results(names, probs, st.session_state.top_n)
+      ## Show Plot ##
+      st.pyplot(fig)

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+gdown==4.2.0
+gensim==4.1.2
+matplotlib==3.4.3
+nltk==3.6.5
+numpy==1.21.2
+pandas==1.3.4
+pickleshare==0.7.5
+scipy==1.7.1
+streamlit==1.2.0