automaise
/

quokka-7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

patricia-rocha commited on Jun 28, 2023

Commit

57da56b

·

1 Parent(s): 8b56166

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -130,11 +130,13 @@ Follows the results against GPT-3.5 and two of the highest performing open-sourc
 * Automatic Evaluation **in Portuguese**:
-|                        | **Lose** | **Tie** | **Win** |
-|------------------------|----------|---------|---------|
-| Quokka vs. **GPT-3.5** | 63.8%    | 10.1%   | 26.1%   |
-| Quokka vs. **Vicuna**  | 66.2%    | 8.8%    | 25.0%   |
-| Quokka vs. **Falcon**  | 17.4%    | 1.4%    | 81.2%   |
 ## Environmental impact

 * Automatic Evaluation **in Portuguese**:
+|                            | **Lose** | **Tie** | **Win** |
+|----------------------------|----------|---------|---------|
+| Quokka vs. **GPT-3.5**     | 63.8%    | 10.1%   | 26.1%   |
+| Quokka vs. **Vicuna-13B**  | 66.2%    | 8.8%    | 25.0%   |
+| Quokka vs. **Falcon-40B**  | 17.4%    | 1.4%    | 81.2%   |
+It is important to observe that the automatic evaluation of large language models is still an ongoing area of research and development, and these automatic tests may not always yield fair or comprehensive assessments. Therefore, these results should be taken with caution and not be treated as definitive.
 ## Environmental impact