patricia-rocha
commited on
Commit
•
81a87ac
1
Parent(s):
6e46ae4
Update README.md
Browse files
README.md
CHANGED
@@ -113,13 +113,14 @@ We then conducted their [automatic evaluation](https://github.com/FreedomIntelli
|
|
113 |
This prompt was designed to elicit assessments of answers in terms of helpfulness, relevance, accuracy, and level of detail.
|
114 |
[Additional prompts](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/prompts/order/prompt_all.json) are provided for assessing overall performance on different perspectives.
|
115 |
|
116 |
-
Follows the results against GPT-3.5
|
117 |
|
118 |
* Automatic Evaluation **in Portuguese**:
|
119 |
|
120 |
| | **Lose** | **Tie** | **Win** |
|
121 |
|------------------------|----------|---------|---------|
|
122 |
| QUOKKA vs. **GPT-3.5** | 63.8% | 10.1% | 26.1% |
|
|
|
123 |
| QUOKKA vs. **Falcon** | 17.4% | 1.4% | 81.2% |
|
124 |
|
125 |
## Environmental impact
|
|
|
113 |
This prompt was designed to elicit assessments of answers in terms of helpfulness, relevance, accuracy, and level of detail.
|
114 |
[Additional prompts](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/prompts/order/prompt_all.json) are provided for assessing overall performance on different perspectives.
|
115 |
|
116 |
+
Follows the results against GPT-3.5, two of the highest performing open-source models at the moment, Vicuna (13B) and Falcon (7B):
|
117 |
|
118 |
* Automatic Evaluation **in Portuguese**:
|
119 |
|
120 |
| | **Lose** | **Tie** | **Win** |
|
121 |
|------------------------|----------|---------|---------|
|
122 |
| QUOKKA vs. **GPT-3.5** | 63.8% | 10.1% | 26.1% |
|
123 |
+
| QUOKKA vs. **Vicuna** | 66.2% | 8.8% | 25.0% |
|
124 |
| QUOKKA vs. **Falcon** | 17.4% | 1.4% | 81.2% |
|
125 |
|
126 |
## Environmental impact
|