speakleash
/

Bielik-11B-v2.2-Instruct

@@ -221,7 +221,7 @@ Bielik-11B-v2.2-Instruct shows impressive performance on English language tasks:
 These results demonstrate Bielik-11B-v2.2-Instruct's versatility in both Polish and English, highlighting the effectiveness of its instruction tuning process.
 ### Polish MT-Bench
-The Bielik-11B-v.2.2-Instruct (16 bit) model was also evaluated using the MT-Bench benchmark. The quality of the model was evaluated using the English version (original version without modifications) and the Polish version created by Speakleash (tasks and evaluation in Polish, the content of the tasks was also changed to take into account the context of the Polish language).
 #### MT-Bench English
 | Model           | Score    |
@@ -255,7 +255,7 @@ The Bielik-11B-v.2.2-Instruct (16 bit) model was also evaluated using the MT-Ben
 Key observations on Bielik-11B-v2.2 performance:
-1. Strong performance among mid-sized models: Bielik-11B-v2.2-Instruct scored **8.115625**, placing it ahead of several well-known models like GPT-3.5-turbo (7.868750) and Mixtral-8x7b (7.637500). This indicates that Bielik-11B-v2.2 is competitive among mid-sized models, particularly those in the 11B-70B parameter range.
 2. Competitive against larger models: Bielik-11B-v2.2-Instruct performs close to Meta-Llama-3.1-70B-Instruct (8.150000), Meta-Llama-3.1-405B-Instruct (8.168750) and even Mixtral-8x22b (8.231250), which have significantly more parameters. This efficiency in performance relative to size could make it an attractive option for tasks where resource constraints are a consideration. Bielik 100% generated answers in Polish, while other models (not typically trained for Polish) can answer Polish questions in English.
@@ -309,7 +309,7 @@ This benchmark provides a robust and time-efficient method for assessing LLM per
 | Model                         | MixEval | MixEval-Hard |
 |-------------------------------|---------|--------------|
 | Bielik-11B-v2.1-Instruct      | 74.55   | 45.00        |
-| **Bielik-11B-v2.2-Instruct**  | 72.35   | 39.65        |
 | Bielik-11B-v2.0-Instruct      | 72.10   | 40.20        |
 | Mistral-7B-Instruct-v0.2      | 70.00   | 36.20        |

 These results demonstrate Bielik-11B-v2.2-Instruct's versatility in both Polish and English, highlighting the effectiveness of its instruction tuning process.
 ### Polish MT-Bench
+The Bielik-11B-v2.2-Instruct (16 bit) model was also evaluated using the MT-Bench benchmark. The quality of the model was evaluated using the English version (original version without modifications) and the Polish version created by Speakleash (tasks and evaluation in Polish, the content of the tasks was also changed to take into account the context of the Polish language).
 #### MT-Bench English
 | Model           | Score    |
 Key observations on Bielik-11B-v2.2 performance:
+1. Strong performance among mid-sized models: Bielik-11B-v2.2-Instruct scored **8.115625**, placing it ahead of several well-known models like GPT-3.5-turbo (7.868750) and Mixtral-8x7b (7.637500). This indicates that Bielik-11B-v2.2-Instruct is competitive among mid-sized models, particularly those in the 11B-70B parameter range.
 2. Competitive against larger models: Bielik-11B-v2.2-Instruct performs close to Meta-Llama-3.1-70B-Instruct (8.150000), Meta-Llama-3.1-405B-Instruct (8.168750) and even Mixtral-8x22b (8.231250), which have significantly more parameters. This efficiency in performance relative to size could make it an attractive option for tasks where resource constraints are a consideration. Bielik 100% generated answers in Polish, while other models (not typically trained for Polish) can answer Polish questions in English.
 | Model                         | MixEval | MixEval-Hard |
 |-------------------------------|---------|--------------|
 | Bielik-11B-v2.1-Instruct      | 74.55   | 45.00        |
+| **Bielik-11B-v2.2-Instruct**  | **72.35**   | **39.65**        |
 | Bielik-11B-v2.0-Instruct      | 72.10   | 40.20        |
 | Mistral-7B-Instruct-v0.2      | 70.00   | 36.20        |