automaise
/

quokka-7b

Text Generation

Carbon Emissions

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

patricia-rocha commited on Jun 21, 2023

Commit

c351234

•

1 Parent(s): ee30b9f

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -109,16 +109,16 @@ This fine-tuning approach allowed us to significantly reduce memory usage and co
 ## Evaluation results
 To evaluate the performance of our model, we translated [70 questions](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/questions/questions-en.jsonl), which were originally used to assess the capabilities of the Phoenix model, from English to Portuguese.
-We then conducted their [automatic evaluation](https://github.com/FreedomIntelligence/LLMZoo) using GTP-3.5 as an evaluator and the general prompt as the metric evaluation prompt.
 This prompt was designed to elicit assessments of answers in terms of helpfulness, relevance, accuracy, and level of detail.
 [Additional prompts](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/prompts/order/prompt_all.json) are provided for assessing overall performance on different perspectives.
-Follows the results against GPT-3.5 and our base model, Phoenix:
 |                        | **Lose** | **Tie** | **Win** |
 |------------------------|----------|---------|---------|
 | QUOKKA vs. **GPT-3.5** | 63.8%    | 10.1%   | 26.1%   |
-| QUOKKA vs. **Phoenix** |          |         |         |
 ## Environmental impact

 ## Evaluation results
 To evaluate the performance of our model, we translated [70 questions](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/questions/questions-en.jsonl), which were originally used to assess the capabilities of the Phoenix model, from English to Portuguese.
+We then conducted their [automatic evaluation](https://github.com/FreedomIntelligence/LLMZoo) using GTP-3.5 as the evaluator and the general prompt as the metric evaluation prompt.
 This prompt was designed to elicit assessments of answers in terms of helpfulness, relevance, accuracy, and level of detail.
 [Additional prompts](https://github.com/FreedomIntelligence/LLMZoo/blob/main/llmzoo/eval/prompts/order/prompt_all.json) are provided for assessing overall performance on different perspectives.
+Follows the results against GPT-3.5 and Falcon, one of the highest performing open-source models at the moment:
 |                        | **Lose** | **Tie** | **Win** |
 |------------------------|----------|---------|---------|
 | QUOKKA vs. **GPT-3.5** | 63.8%    | 10.1%   | 26.1%   |
+| QUOKKA vs. **Falcon**  | 17.4%    | 1.4%    | 81.2%   |
 ## Environmental impact