AudreyVM commited on
Commit
c5123e7
verified
1 Parent(s): a84de90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -5
README.md CHANGED
@@ -725,14 +725,58 @@ NLLB-3.3 ([Costa-juss脿 et al., 2022](https://arxiv.org/abs/2207.04672)) and [Sa
725
 
726
  </details>
727
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
728
  ## Ethical Considerations and Limitations
729
 
730
  Detailed information on the work done to examine the presence of unwanted social and cognitive biases in the base model can be found
731
  at [Salamandra-7B model card](https://huggingface.co/BSC-LT/salamandra-7b).
732
- With regard to MT models, no specific analysis has yet been carried out in order to evaluate potential biases or limitations in translation
 
733
  accuracy across different languages, dialects, or domains. However, we recognize the importance of identifying and addressing any harmful stereotypes,
734
- cultural inaccuracies, or systematic performance discrepancies that may arise in Machine Translation. As such, we plan to perform more analyses as soon
735
- as we have implemented the necessary metrics and methods within our evaluation framework [MT Lens](https://github.com/langtech-bsc/mt-evaluation).
736
  Note that the model has only undergone preliminary instruction tuning.
737
  We urge developers to consider potential limitations and conduct safety testing and tuning tailored to their specific applications.
738
 
@@ -755,9 +799,11 @@ within the framework of [ILENIA Project](https://proyectoilenia.es/) with refere
755
 
756
  ### Acknowledgements
757
 
758
- The success of this project has been made possible thanks to the invaluable contributions of our partners in the [ILENIA Project](https://proyectoilenia.es/):
759
- [HiTZ](http://hitz.ehu.eus/es), and [CiTIUS](https://citius.gal/es/).
760
  Their efforts have been instrumental in advancing our work, and we sincerely appreciate their support.
 
 
 
761
 
762
 
763
  ### Disclaimer
 
725
 
726
  </details>
727
 
728
+ <details>
729
+ ### Gender Aware Translation
730
+
731
+ Below are the evaluation results for gender aware translation evaluated on the [MT-GenEval](https://github.com/amazon-science/machine-translation-gender-eval?tab=readme-ov-file#mt-geneval) dataset ([Currey, A. et al.](https://github.com/amazon-science/machine-translation-gender-eval?tab=readme-ov-file#mt-geneval)).
732
+ These have been calculated for translation from English into German, Spanish, French, Italian, Portuguese and Russian and are compared against MADLAD400-7B, TowerInstruct-7B-v0.2 and the SalamandraTA-7b-base model.
733
+ Evaluation was conducted using MT-Lens and is reported as accuracy computed using the accuracy metric provided with MT-GenEval.
734
+
735
+ | | Source | Target | Masc | Fem | Pair |
736
+ |:---------------------------------|:---------|:---------|-------:|-------:|-------:|
737
+ | SalamandraTA-7b-instruct | en | de | **0.8833333333333333** | **0.8833333333333333** | **0.7733333333333333** |
738
+ | SalamandraTA-7b-base | en | de | 0.8566666666666667 | 0.77 | 0.66 |
739
+ | MADLAD400-7B | en | de | 0.8766666666666667 | 0.8233333333333334 | 0.7133333333333334 |
740
+ | TowerInstruct-7B-v0.2 | en | de | 0.8633333333333333 | 0.84 | 0.7266666666666667 |
741
+ | | | | | | |
742
+ | SalamandraTA-7b-instruct | en | es | 0.8666666666666667 | **0.85** | **0.7366666666666667** |
743
+ | SalamandraTA-7b-base | en | es | **0.89** | 0.7333333333333333 | 0.6433333333333333 |
744
+ | MADLAD400-7B | en | es | 0.8866666666666667 | 0.78 | 0.6866666666666666 |
745
+ | TowerInstruct-7B-v0.2 | en | es | 0.85 | 0.8233333333333334 | 0.6933333333333334 |
746
+ | | | | | | |
747
+ | SalamandraTA-7b-instruct | en | fr | **0.9** | 0.82 | **0.7366666666666667** |
748
+ | SalamandraTA-7b-base | en | fr | 0.8866666666666667 | 0.71 | 0.6166666666666667 |
749
+ | MADLAD400-7B | en | fr | 0.8733333333333333 | 0.7766666666666666 | 0.6633333333333333 |
750
+ | TowerInstruct-7B-v0.2 | en | fr | 0.88 | **0.8233333333333334** | 0.7166666666666667 |
751
+ | | | | | | |
752
+ | SalamandraTA-7b-instruct | en | it | 0.9 | **0.7633333333333333** | 0.6833333333333333 |
753
+ | SalamandraTA-7b-base | en | it | 0.8933333333333333 | 0.5933333333333334 | 0.5133333333333333 |
754
+ | MADLAD400-7B | en | it | 0.9066666666666666 | 0.6633333333333333 | 0.5966666666666667 |
755
+ | TowerInstruct-7B-v0.2 | en | it | **0.9466666666666667** | 0.7466666666666667 | **0.7133333333333334** |
756
+ | | | | | | |
757
+ | SalamandraTA-7b-instruct | en | pt | 0.92 | **0.77** | **0.7066666666666667** |
758
+ | SalamandraTA-7b-base | en | pt | **0.9233333333333333** | 0.65 | 0.5966666666666667 |
759
+ | MADLAD400-7B | en | pt | **0.9233333333333333** | 0.6866666666666666 | 0.6266666666666667 |
760
+ | TowerInstruct-7B-v0.2 | en | pt | 0.9066666666666666 | 0.73 | 0.67 |
761
+ | | | | | | |
762
+ | SalamandraTA-7b-instruct | en | ru | **0.95** | **0.8366666666666667** | **0.7933333333333333** |
763
+ | SalamandraTA-7b-base | en | ru | 0.9333333333333333 | 0.7133333333333334 | 0.6533333333333333 |
764
+ | MADLAD400-7B | en | ru | 0.94 | 0.7966666666666666 | 0.74 |
765
+ | TowerInstruct-7B-v0.2 | en | ru | 0.9333333333333333 | 0.7966666666666666 | 0.7333333333333333 |
766
+
767
+ <img src="./images/geneval.png"/>
768
+
769
+ </details>
770
+
771
  ## Ethical Considerations and Limitations
772
 
773
  Detailed information on the work done to examine the presence of unwanted social and cognitive biases in the base model can be found
774
  at [Salamandra-7B model card](https://huggingface.co/BSC-LT/salamandra-7b).
775
+ With regard to MT models, the only analysis related to bias which we have conducted is the MT-GenEval evaluation.
776
+ No specific analysis has yet been carried out in order to evaluate potential biases or limitations in translation
777
  accuracy across different languages, dialects, or domains. However, we recognize the importance of identifying and addressing any harmful stereotypes,
778
+ cultural inaccuracies, or systematic performance discrepancies that may arise in Machine Translation. As such, we plan to continue performing more analyses as
779
+ as we implement the necessary metrics and methods within our evaluation framework [MT Lens](https://github.com/langtech-bsc/mt-evaluation).
780
  Note that the model has only undergone preliminary instruction tuning.
781
  We urge developers to consider potential limitations and conduct safety testing and tuning tailored to their specific applications.
782
 
 
799
 
800
  ### Acknowledgements
801
 
802
+ The success of this project has been made possible thanks to the invaluable contributions of numerous research centers, teams, and projects that provided access to their data.
 
803
  Their efforts have been instrumental in advancing our work, and we sincerely appreciate their support.
804
+ We would like to thank, among others:
805
+ [CENID](https://cenid.es/), [CiTIUS](https://citius.gal/es/), [Gaitu proiektua](https://gaitu.eus/), [Helsinki NLP](https://github.com/Helsinki-NLP), [HiTZ](http://hitz.ehu.eus/es), [Institut d鈥橢studis Aranesi](http://www.institutestudisaranesi.cat/), [MaCoCu Project](https://macocu.eu/), [Machine Translate Foundation](https://machinetranslate.org/about), [NTEU Project](https://nteu.eu/), [Orai NLP technologies](https://huggingface.co/orai-nlp), [Proxecto N贸s](https://nos.gal/es/proxecto-nos), [Softcatal脿](https://www.softcatala.org/), [Tatoeba Project](https://tatoeba.org/), [TILDE Project](https://tilde.ai/tildelm/), [Transducens - Departament de Llenguatges i Sistemes Inform脿tics Universitat d鈥橝lacant](https://transducens.dlsi.ua.es/), [Unbabel](https://huggingface.co/Unbabel).
806
+
807
 
808
 
809
  ### Disclaimer