Contributing to multilingual evaluations
Contributing a small translation
We define 19 literals
, basic keywords or punctuation signs used when creating evaluation prompts in an automatic manner, such as yes
, no
, because
, etc.
We welcome translations in your language!
To contribute, you’ll need to
- Open the translation_literals file
- Edit the file to add or expand the literal for your language of interest.
Language.ENGLISH: TranslationLiterals(
language=Language.ENGLISH,
question_word="question", # Usage: "Question: How are you?"
answer="answer", # Usage: "Answer: I am fine"
confirmation_word="right", # Usage: "He is smart, right?"
yes="yes", # Usage: "Yes, he is"
no="no", # Usage: "No, he is not"
also="also", # Usage: "Also, she is smart."
cause_word="because", # Usage: "She is smart, because she is tall"
effect_word="therefore", # Usage: "He is tall therefore he is smart"
or_word="or", # Usage: "He is tall or small"
true="true", # Usage: "He is smart, true, false or neither?"
false="false", # Usage: "He is smart, true, false or neither?"
neither="neither", # Usage: "He is smart, true, false or neither?"
# Punctuation and spacing: only adjust if your language uses something different than in English
full_stop=".",
comma=",",
question_mark="?",
exclamation_mark="!",
word_space=" ",
sentence_space=" ",
colon=":",
# The first characters of your alphabet used in enumerations, if different from English
indices=["A", "B", "C", ...]
)
- Open a PR with your modifications! And voilà!
Contributing a new multilingual task
You should first read our guide on adding a custom task, to better understand the different parameters we use.
Then, you should take a look at the current multilingual tasks file, to understand how they are defined. For multilingual evaluations the prompt_function
should be implemented by language-adapted template. The template will take care of correct formatting, correct and consistent usage of language adjusted prompt anchors (e.g Question/Answer) and punctuation.
Browse the list of all templates here to see which are the most adapted to your own task.
Then, when ready, to define your own task, you should:
- create a Python file as indicated in the above guide
- import the relevant templates for your task type (XNLI, Copa, Multiple choice, Question Answering, etc)
- define one or a list of tasks for each relevant language and evaluation formulation (for multichoice) using our parametrizable LightevalTaskConfig class
your_tasks = [
LightevalTaskConfig(
# Name of your evaluation
name=f"evalname_{language.value}_{formulation.name.lower()}",
# The evaluation is community contributed
suite=["community"],
# This will automatically get the correct metrics for your chosen formulation
metric=get_metrics_for_formulation(
formulation,
[
loglikelihood_acc_metric(normalization=None),
loglikelihood_acc_metric(normalization=LogProbTokenNorm()),
loglikelihood_acc_metric(normalization=LogProbCharNorm()),
],
),
# In this function, you choose which template to follow and for which language and formulation
prompt_function=get_template_prompt_function(
language=language,
# then use the adapter to define the mapping between the
# keys of the template (left), and the keys of your dataset
# (right)
# To know which template keys are required and available,
# consult the appropriate adapter type and doc-string.
adapter=lambda line: {
"key": line["relevant_key"],
...
},
formulation=formulation,
),
# You can also add specific filters to remove irrelevant samples
hf_filter=lambda line: line["label"] in <condition>,
# You then select your huggingface dataset as well as
# the splits available for evaluation
hf_repo=<dataset>,
hf_subset=<subset>,
evaluation_splits=["train"],
hf_avail_splits=["train"],
)
for language in [
Language.YOUR_LANGUAGE, ...
]
for formulation in [MCFFormulation(), CFFormulation(), HybridFormulation()]
]
- then, you can go back to the guide to test if your task is correctly implemented!
All LightevalTaskConfig parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE’s functionality to make it easier to correctly fill these parameters.
Once everything is good, open a PR, and we’ll be happy to review it!
< > Update on GitHub