cheftransformer commited on
Commit
b7c891d
1 Parent(s): 1f90b63

Merge pull request #7 from m3hrdadfi/update-evaluation

Browse files
Files changed (1) hide show
  1. README.md +17 -14
README.md CHANGED
@@ -75,22 +75,23 @@ model = FlaxAutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME_OR_PATH)
75
 
76
  prefix = "items: "
77
  # generation_kwargs = {
78
- # "max_length": 1024,
79
- # "min_length": 128,
80
  # "no_repeat_ngram_size": 3,
81
- # "do_sample": True,
82
- # "top_k": 60,
83
- # "top_p": 0.95
84
  # }
85
  generation_kwargs = {
86
  "max_length": 512,
87
  "min_length": 64,
88
  "no_repeat_ngram_size": 3,
89
- "early_stopping": True,
90
- "num_beams": 5,
91
- "length_penalty": 1.5,
92
  }
93
 
 
94
  special_tokens = tokenizer.all_special_tokens
95
  tokens_map = {
96
  "<sep>": "--",
@@ -214,14 +215,16 @@ Output:
214
 
215
  ## Evaluation
216
 
217
- The following table summarizes the scores obtained by the **Chef Transformer**. Those marked as (*) are the baseline models.
 
 
218
 
219
- | Model | WER | COSIM | ROUGE-2 |
220
- | :-------------: | :---: | :---: | :-----: |
221
- | Recipe1M+ * | 0.786 | 0.589 | - |
222
- | RecipeNLG * | 0.751 | 0.666 | - |
223
- | ChefTransformer | 0.709 | 0.714 | 0.290 |
224
 
 
225
 
226
  ## Streamlit demo
227
 
 
75
 
76
  prefix = "items: "
77
  # generation_kwargs = {
78
+ # "max_length": 512,
79
+ # "min_length": 64,
80
  # "no_repeat_ngram_size": 3,
81
+ # "early_stopping": True,
82
+ # "num_beams": 5,
83
+ # "length_penalty": 1.5,
84
  # }
85
  generation_kwargs = {
86
  "max_length": 512,
87
  "min_length": 64,
88
  "no_repeat_ngram_size": 3,
89
+ "do_sample": True,
90
+ "top_k": 60,
91
+ "top_p": 0.95
92
  }
93
 
94
+
95
  special_tokens = tokenizer.all_special_tokens
96
  tokens_map = {
97
  "<sep>": "--",
 
215
 
216
  ## Evaluation
217
 
218
+ Since the test set is not available, we will evaluate the model based on a shared test set. This test set consists of 5% of the whole test (*= 5,000 records*),
219
+ and we will generate five recipes for each input(*= 25,000 records*).
220
+ The following table summarizes the scores obtained by the **Chef Transformer** and **RecipeNLG** as our baseline.
221
 
222
+ | Model | COSIM | WER | ROUGE-2 | BLEU | GLEU | METEOR |
223
+ |:------------------------------------------------------------------------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
224
+ | [RecipeNLG](https://huggingface.co/mbien/recipenlg) | 0.5723 | 1.2125 | 0.1354 | 0.1164 | 0.1503 | 0.2309 |
225
+ | [Chef Transformer](huggingface.co/flax-community/t5-recipe-generation) * | **0.7282** | **0.7613** | **0.2470** | **0.3245** | **0.2624** | **0.4150** |
 
226
 
227
+ *From the 5 generated recipes corresponding to each NER (food items), only the highest score was taken into account in the WER, COSIM, and ROUGE metrics. At the same time, BLEU, GLEU, Meteor were designed to have many possible references.*
228
 
229
  ## Streamlit demo
230