patricia-rocha commited on
Commit
c351234
1 Parent(s): ee30b9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -109,16 +109,16 @@ This fine-tuning approach allowed us to significantly reduce memory usage and co
109
  ## Evaluation results
110
 
111
  To evaluate the performance of our model, we translated [70 questions](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/questions/questions-en.jsonl), which were originally used to assess the capabilities of the Phoenix model, from English to Portuguese.
112
- We then conducted their [automatic evaluation](https://github.com/FreedomIntelligence/LLMZoo) using GTP-3.5 as an evaluator and the general prompt as the metric evaluation prompt.
113
  This prompt was designed to elicit assessments of answers in terms of helpfulness, relevance, accuracy, and level of detail.
114
  [Additional prompts](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/prompts/order/prompt_all.json) are provided for assessing overall performance on different perspectives.
115
 
116
- Follows the results against GPT-3.5 and our base model, Phoenix:
117
 
118
  | | **Lose** | **Tie** | **Win** |
119
  |------------------------|----------|---------|---------|
120
  | QUOKKA vs. **GPT-3.5** | 63.8% | 10.1% | 26.1% |
121
- | QUOKKA vs. **Phoenix** | | | |
122
 
123
  ## Environmental impact
124
 
 
109
  ## Evaluation results
110
 
111
  To evaluate the performance of our model, we translated [70 questions](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/questions/questions-en.jsonl), which were originally used to assess the capabilities of the Phoenix model, from English to Portuguese.
112
+ We then conducted their [automatic evaluation](https://github.com/FreedomIntelligence/LLMZoo) using GTP-3.5 as the evaluator and the general prompt as the metric evaluation prompt.
113
  This prompt was designed to elicit assessments of answers in terms of helpfulness, relevance, accuracy, and level of detail.
114
  [Additional prompts](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/prompts/order/prompt_all.json) are provided for assessing overall performance on different perspectives.
115
 
116
+ Follows the results against GPT-3.5 and Falcon, one of the highest performing open-source models at the moment:
117
 
118
  | | **Lose** | **Tie** | **Win** |
119
  |------------------------|----------|---------|---------|
120
  | QUOKKA vs. **GPT-3.5** | 63.8% | 10.1% | 26.1% |
121
+ | QUOKKA vs. **Falcon** | 17.4% | 1.4% | 81.2% |
122
 
123
  ## Environmental impact
124