Lighteval documentation

Adding a Custom Task

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.9.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Adding a Custom Task

To add a new task, first either open an issue, to determine whether it will be integrated in the core evaluations of lighteval, in the extended tasks, or the community tasks, and add its dataset on the hub.

  • Core evaluations are evaluations that only require standard logic in their metrics and processing, and that we will add to our test suite to ensure non regression through time. They already see high usage in the community.
  • Extended evaluations are evaluations that require custom logic in their metrics (complex normalisation, an LLM as a judge, …), that we added to facilitate the life of users. They already see high usage in the community.
  • Community evaluations are submissions by the community of new tasks.

A popular community evaluation can move to become an extended or core evaluation over time.

You can find examples of custom tasks in the community_task directory.

Step by step creation of a custom task

To contribute your custom metric to the lighteval repo, you would first need to install the required dev dependencies by running pip install -e .[dev] and then run pre-commit install to install the pre-commit hooks.

First, create a python file under the community_tasks directory.

You need to define a prompt function that will convert a line from your dataset to a document to be used for evaluation.

# Define as many as you need for your different tasks
def prompt_fn(line, task_name: str = None):
    """Defines how to go from a dataset line to a doc object.
    Follow examples in src/lighteval/tasks/default_prompts.py, or get more info
    about what this function should do in the README.
    """
    return Doc(
        task_name=task_name,
        query=line["question"],
        choices=[f" {c}" for c in line["choices"]],
        gold_index=line["gold"],
    )

Then, you need to choose a metric: you can either use an existing one (defined in lighteval.metrics.metrics.Metrics) or create a custom one).

custom_metric = SampleLevelMetric(
    metric_name="my_custom_metric_name",
    higher_is_better=True,
    category=SamplingMethod.{GENERATIVE,LOGPROBS},
    sample_level_fn=lambda x: x,  # how to compute score for one sample
    corpus_level_fn=np.mean,  # How to aggregate the samples metrics
)

Then, you need to define your task using LightevalTaskConfig. You can define a task with or without subsets. To define a task with no subsets:

# This is how you create a simple task (like hellaswag) which has one single subset
# attached to it, and one evaluation possible.
task = LightevalTaskConfig(
    name="myothertask",
    prompt_function=prompt_fn,  # must be defined in the file or imported from src/lighteval/tasks/tasks_prompt_formatting.py
    suite=["community"],
    hf_repo="",
    hf_subset="default",
    hf_avail_splits=[],
    evaluation_splits=[],
    few_shots_split=None,
    few_shots_select=None,
    metrics=[],  # select your metric in Metrics
)

If you want to create a task with multiple subset, add them to the SAMPLE_SUBSETS list and create a task for each subset.

SAMPLE_SUBSETS = []  # list of all the subsets to use for this eval


class CustomSubsetTask(LightevalTaskConfig):
    def __init__(
        self,
        name,
        hf_subset,
    ):
        super().__init__(
            name=name,
            hf_subset=hf_subset,
            prompt_function=prompt_fn,  # must be defined in the file or imported from src/lighteval/tasks/tasks_prompt_formatting.py
            hf_repo="",
            metric=[custom_metric],  # select your metric in Metrics or use your custom_metric
            hf_avail_splits=[],
            evaluation_splits=[],
            few_shots_split=None,
            few_shots_select=None,
            suite=["community"],
            generation_size=-1,
            stop_sequence=None,
        )
SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]

Then you need to add your task to the TASKS_TABLE list.

# STORE YOUR EVALS

# tasks with subset:
TASKS_TABLE = SUBSET_TASKS

# tasks without subset:
# TASKS_TABLE = [task]

Once your file is created you can then run the evaluation with the following command:

lighteval accelerate \
    "model_name=HuggingFaceH4/zephyr-7b-beta" \
    "community|{custom_task}|{fewshots}|{truncate_few_shot}" \
    --custom-tasks {path_to_your_custom_task_file}
< > Update on GitHub