patricia-rocha
commited on
Commit
·
57da56b
1
Parent(s):
8b56166
Update README.md
Browse files
README.md
CHANGED
@@ -130,11 +130,13 @@ Follows the results against GPT-3.5 and two of the highest performing open-sourc
|
|
130 |
|
131 |
* Automatic Evaluation **in Portuguese**:
|
132 |
|
133 |
-
|
|
134 |
-
|
135 |
-
| Quokka vs. **GPT-3.5**
|
136 |
-
| Quokka vs. **Vicuna** | 66.2% | 8.8% | 25.0% |
|
137 |
-
| Quokka vs. **Falcon** | 17.4% | 1.4% | 81.2% |
|
|
|
|
|
138 |
|
139 |
## Environmental impact
|
140 |
|
|
|
130 |
|
131 |
* Automatic Evaluation **in Portuguese**:
|
132 |
|
133 |
+
| | **Lose** | **Tie** | **Win** |
|
134 |
+
|----------------------------|----------|---------|---------|
|
135 |
+
| Quokka vs. **GPT-3.5** | 63.8% | 10.1% | 26.1% |
|
136 |
+
| Quokka vs. **Vicuna-13B** | 66.2% | 8.8% | 25.0% |
|
137 |
+
| Quokka vs. **Falcon-40B** | 17.4% | 1.4% | 81.2% |
|
138 |
+
|
139 |
+
It is important to observe that the automatic evaluation of large language models is still an ongoing area of research and development, and these automatic tests may not always yield fair or comprehensive assessments. Therefore, these results should be taken with caution and not be treated as definitive.
|
140 |
|
141 |
## Environmental impact
|
142 |
|