patricia-rocha commited on
Commit
57da56b
1 Parent(s): 8b56166

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -130,11 +130,13 @@ Follows the results against GPT-3.5 and two of the highest performing open-sourc
130
 
131
  * Automatic Evaluation **in Portuguese**:
132
 
133
- | | **Lose** | **Tie** | **Win** |
134
- |------------------------|----------|---------|---------|
135
- | Quokka vs. **GPT-3.5** | 63.8% | 10.1% | 26.1% |
136
- | Quokka vs. **Vicuna** | 66.2% | 8.8% | 25.0% |
137
- | Quokka vs. **Falcon** | 17.4% | 1.4% | 81.2% |
 
 
138
 
139
  ## Environmental impact
140
 
 
130
 
131
  * Automatic Evaluation **in Portuguese**:
132
 
133
+ | | **Lose** | **Tie** | **Win** |
134
+ |----------------------------|----------|---------|---------|
135
+ | Quokka vs. **GPT-3.5** | 63.8% | 10.1% | 26.1% |
136
+ | Quokka vs. **Vicuna-13B** | 66.2% | 8.8% | 25.0% |
137
+ | Quokka vs. **Falcon-40B** | 17.4% | 1.4% | 81.2% |
138
+
139
+ It is important to observe that the automatic evaluation of large language models is still an ongoing area of research and development, and these automatic tests may not always yield fair or comprehensive assessments. Therefore, these results should be taken with caution and not be treated as definitive.
140
 
141
  ## Environmental impact
142