Adding a Custom Task

Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.

Step-by-Step Creation of a Task

To contribute your task to the Lighteval repository, you would first need to install the required dev dependencies by running pip install -e .[dev] and then run pre-commit install to install the pre-commit hooks.

Step 1: Create the Task File

First, create a Python file or directory under the src/lighteval/tasks/tasks directory. A directory is helpfull if you need to split your file into multiple ones, just make sure to have one of the file named main.py.

Step 2: Define the Prompt Function

You need to define a prompt function that will convert a line from your dataset to a document to be used for evaluation.

from lighteval.tasks.requests import Doc

# Define as many as you need for your different tasks
def prompt_fn(line: dict, task_name: str):
    """Defines how to go from a dataset line to a doc object.
    Follow examples in src/lighteval/tasks/default_prompts.py, or get more info
    about what this function should do in the README.
    """
    return Doc(
        task_name=task_name,
        query=line["question"],
        choices=[f" {c}" for c in line["choices"]],
        gold_index=line["gold"],
    )

Step 3: Choose or Create Metrics

You can either use an existing metric (defined in lighteval.metrics.metrics.Metrics) or create a custom one.

Using Existing Metrics

from lighteval.metrics import Metrics

# Use an existing metric
metric = Metrics.ACCURACY

Creating Custom Metrics

from lighteval.metrics.utils.metric_utils import SampleLevelMetric
import numpy as np

custom_metric = SampleLevelMetric(
    metric_name="my_custom_metric_name",
    higher_is_better=True,
    category="accuracy",
    sample_level_fn=lambda x: x,  # How to compute score for one sample
    corpus_level_fn=np.mean,  # How to aggregate the sample metrics
)

Step 4: Define Your Task

You can define a task with or without subsets using LightevalTaskConfig.

Simple Task (No Subsets)

from lighteval.tasks.lighteval_task import LightevalTaskConfig

# This is how you create a simple task (like HellaSwag) which has one single subset
# attached to it, and one evaluation possible.
task = LightevalTaskConfig(
    name="myothertask",
    prompt_function=prompt_fn,  # Must be defined in the file or imported
    hf_repo="your_dataset_repo_on_hf",
    hf_subset="default",
    hf_avail_splits=["train", "test"],
    evaluation_splits=["test"],
    few_shots_split="train",
    few_shots_select="random_sampling_from_train",
    metrics=[metric],  # Select your metric in Metrics
    generation_size=256,
    stop_sequence=["\n", "Question:"],
)

Task with Multiple Subsets

If you want to create a task with multiple subsets, add them to the SAMPLE_SUBSETS list and create a task for each subset.

SAMPLE_SUBSETS = ["subset1", "subset2", "subset3"]  # List of all the subsets to use for this eval

class CustomSubsetTask(LightevalTaskConfig):
    def __init__(
        self,
        name,
        hf_subset,
    ):
        super().__init__(
            name=name,
            hf_subset=hf_subset,
            prompt_function=prompt_fn,  # Must be defined in the file or imported
            hf_repo="your_dataset_name",
            metrics=[custom_metric],  # Select your metric in Metrics or use your custom_metric
            hf_avail_splits=["train", "test"],
            evaluation_splits=["test"],
            few_shots_split="train",
            few_shots_select="random_sampling_from_train",
            generation_size=256,
            stop_sequence=["\n", "Question:"],
        )

SUBSET_TASKS = [CustomSubsetTask(name=f"task:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]

Step 5: Add Tasks to the Table

Then you need to add your task to the TASKS_TABLE list.

# STORE YOUR EVALS

# Tasks with subsets:
TASKS_TABLE = SUBSET_TASKS

# Tasks without subsets:
# TASKS_TABLE = [task]

Step 6: Creating a requirement file

If your task has requirements, you need to create a requirement.txt file with only the required dependencies so that anyone can run your task.

Running Your Custom Task

Once your file is created, you can run the evaluation with the following command:

lighteval accelerate \
    "model_name=HuggingFaceH4/zephyr-7b-beta" \
    {task} \
    --custom-tasks {path_to_your_custom_task_file}

Example Usage

# Run a custom task with 3 shot evaluation
lighteval accelerate \
    "model_name=openai-community/gpt2" \
    "myothertask|3" \
    --custom-tasks community_tasks/my_custom_task.py

Update on GitHub