BSC-LT
/

salamandraTA-7b-instruct

@@ -725,14 +725,58 @@ NLLB-3.3 ([Costa-jussà et al., 2022](https://arxiv.org/abs/2207.04672)) and [Sa
 </details>
 ## Ethical Considerations and Limitations
 Detailed information on the work done to examine the presence of unwanted social and cognitive biases in the base model can be found
 at [Salamandra-7B model card](https://huggingface.co/BSC-LT/salamandra-7b).
-With regard to MT models, no specific analysis has yet been carried out in order to evaluate potential biases or limitations in translation
 accuracy across different languages, dialects, or domains. However, we recognize the importance of identifying and addressing any harmful stereotypes,
-cultural inaccuracies, or systematic performance discrepancies that may arise in Machine Translation. As such, we plan to perform more analyses as soon
-as we have implemented the necessary metrics and methods within our evaluation framework [MT Lens](https://github.com/langtech-bsc/mt-evaluation).
 Note that the model has only undergone preliminary instruction tuning.
 We urge developers to consider potential limitations and conduct safety testing and tuning tailored to their specific applications.
@@ -755,9 +799,11 @@ within the framework of [ILENIA Project](https://proyectoilenia.es/) with refere
 ### Acknowledgements
-The success of this project has been made possible thanks to the invaluable contributions of our partners in the [ILENIA Project](https://proyectoilenia.es/):
-[HiTZ](http://hitz.ehu.eus/es), and [CiTIUS](https://citius.gal/es/).
 Their efforts have been instrumental in advancing our work, and we sincerely appreciate their support.
 ### Disclaimer

 </details>
+<details>
+### Gender Aware Translation
+Below are the evaluation results for gender aware translation evaluated on the [MT-GenEval](https://github.com/amazon-science/machine-translation-gender-eval?tab=readme-ov-file#mt-geneval) dataset ([Currey, A. et al.](https://github.com/amazon-science/machine-translation-gender-eval?tab=readme-ov-file#mt-geneval)).
+These have been calculated for translation from English into German, Spanish, French, Italian, Portuguese and Russian and are compared against MADLAD400-7B, TowerInstruct-7B-v0.2 and the SalamandraTA-7b-base model.
+Evaluation was conducted using MT-Lens and is reported as accuracy computed using the accuracy metric provided with MT-GenEval.
+|                        | Source   | Target   |   Masc |    Fem |   Pair |
+|:---------------------------------|:---------|:---------|-------:|-------:|-------:|
+| SalamandraTA-7b-instruct | en       | de      |  **0.8833333333333333** | **0.8833333333333333**  | **0.7733333333333333** |
+| SalamandraTA-7b-base     | en       | de      | 0.8566666666666667 | 0.77 |  0.66 |
+| MADLAD400-7B             | en       | de      | 0.8766666666666667 | 0.8233333333333334 | 0.7133333333333334  |
+| TowerInstruct-7B-v0.2    | en       | de      |  0.8633333333333333 | 0.84 | 0.7266666666666667 |
+| | | | | | |
+| SalamandraTA-7b-instruct | en       | es      | 0.8666666666666667  | **0.85**  |  **0.7366666666666667** |
+| SalamandraTA-7b-base     | en       | es      | **0.89** | 0.7333333333333333 | 0.6433333333333333  |
+| MADLAD400-7B             | en       | es      | 0.8866666666666667 | 0.78 |  0.6866666666666666 |
+| TowerInstruct-7B-v0.2    | en       | es      | 0.85  | 0.8233333333333334 | 0.6933333333333334 |
+| | | | | | |
+| SalamandraTA-7b-instruct | en       | fr      | **0.9**  | 0.82  | **0.7366666666666667** |
+| SalamandraTA-7b-base     | en       | fr      | 0.8866666666666667  | 0.71 | 0.6166666666666667  |
+| MADLAD400-7B             | en       | fr      | 0.8733333333333333 | 0.7766666666666666 |  0.6633333333333333 |
+| TowerInstruct-7B-v0.2    | en       | fr      | 0.88  | **0.8233333333333334** | 0.7166666666666667 |
+| | | | | | |
+| SalamandraTA-7b-instruct | en       | it      | 0.9  | **0.7633333333333333**  | 0.6833333333333333 |
+| SalamandraTA-7b-base     | en       | it      | 0.8933333333333333 | 0.5933333333333334 |  0.5133333333333333 |
+| MADLAD400-7B             | en       | it      | 0.9066666666666666 | 0.6633333333333333 |  0.5966666666666667 |
+| TowerInstruct-7B-v0.2    | en       | it      |  **0.9466666666666667** | 0.7466666666666667 | **0.7133333333333334** |
+| | | | | | |
+| SalamandraTA-7b-instruct | en       | pt      | 0.92  | **0.77**  | **0.7066666666666667** |
+| SalamandraTA-7b-base     | en       | pt      | **0.9233333333333333** | 0.65 | 0.5966666666666667  |
+| MADLAD400-7B             | en       | pt      | **0.9233333333333333** | 0.6866666666666666 | 0.6266666666666667  |
+| TowerInstruct-7B-v0.2    | en       | pt      | 0.9066666666666666  | 0.73 | 0.67 |
+| | | | | | |
+| SalamandraTA-7b-instruct | en       | ru      | **0.95**  |  **0.8366666666666667** | **0.7933333333333333** |
+| SalamandraTA-7b-base     | en       | ru      | 0.9333333333333333 | 0.7133333333333334 |  0.6533333333333333 |
+| MADLAD400-7B             | en       | ru      | 0.94 | 0.7966666666666666 |  0.74 |
+| TowerInstruct-7B-v0.2    | en       | ru      |  0.9333333333333333 | 0.7966666666666666 | 0.7333333333333333 |
+<img src="./images/geneval.png"/>
+</details>
 ## Ethical Considerations and Limitations
 Detailed information on the work done to examine the presence of unwanted social and cognitive biases in the base model can be found
 at [Salamandra-7B model card](https://huggingface.co/BSC-LT/salamandra-7b).
+With regard to MT models, the only analysis related to bias which we have conducted is the MT-GenEval evaluation.
+No specific analysis has yet been carried out in order to evaluate potential biases or limitations in translation
 accuracy across different languages, dialects, or domains. However, we recognize the importance of identifying and addressing any harmful stereotypes,
+cultural inaccuracies, or systematic performance discrepancies that may arise in Machine Translation. As such, we plan to continue performing more analyses as
+as we implement the necessary metrics and methods within our evaluation framework [MT Lens](https://github.com/langtech-bsc/mt-evaluation).
 Note that the model has only undergone preliminary instruction tuning.
 We urge developers to consider potential limitations and conduct safety testing and tuning tailored to their specific applications.
 ### Acknowledgements
+The success of this project has been made possible thanks to the invaluable contributions of numerous research centers, teams, and projects that provided access to their data.
 Their efforts have been instrumental in advancing our work, and we sincerely appreciate their support.
+We would like to thank, among others:
+[CENID](https://cenid.es/), [CiTIUS](https://citius.gal/es/), [Gaitu proiektua](https://gaitu.eus/), [Helsinki NLP](https://github.com/Helsinki-NLP), [HiTZ](http://hitz.ehu.eus/es), [Institut d’Estudis Aranesi](http://www.institutestudisaranesi.cat/), [MaCoCu Project](https://macocu.eu/), [Machine Translate Foundation](https://machinetranslate.org/about), [NTEU Project](https://nteu.eu/), [Orai NLP technologies](https://huggingface.co/orai-nlp), [Proxecto Nós](https://nos.gal/es/proxecto-nos), [Softcatalà](https://www.softcatala.org/), [Tatoeba Project](https://tatoeba.org/), [TILDE Project](https://tilde.ai/tildelm/), [Transducens - Departament de Llenguatges i Sistemes Informàtics Universitat d’Alacant](https://transducens.dlsi.ua.es/), [Unbabel](https://huggingface.co/Unbabel).
 ### Disclaimer