Lighteval documentation
Adding a Custom Task
Adding a Custom Task
Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.
Task Categories
Before creating a custom task, consider which category it belongs to:
Core Evaluations
Core evaluations are evaluations that only require standard logic in their metrics and processing, and that we will add to our test suite to ensure non-regression through time. They already see high usage in the community.
Extended Evaluations
Extended evaluations are evaluations that require custom logic in their metrics (complex normalization, an LLM as a judge, etc.), that we added to facilitate the life of users. They already see high usage in the community.
Community Evaluations
Community evaluations are submissions by the community of new tasks.
A popular community evaluation can move to become an extended or core evaluation over time.
You can find examples of custom tasks in the community_tasks directory.
Step-by-Step Creation of a Custom Task
To contribute your custom task to the Lighteval repository, you would first need
to install the required dev dependencies by running pip install -e .[dev]
and then run pre-commit install
to install the pre-commit hooks.
Step 1: Create the Task File
First, create a Python file under the community_tasks
directory.
Step 2: Define the Prompt Function
You need to define a prompt function that will convert a line from your dataset to a document to be used for evaluation.
from lighteval.tasks.requests import Doc
# Define as many as you need for your different tasks
def prompt_fn(line: dict, task_name: str):
"""Defines how to go from a dataset line to a doc object.
Follow examples in src/lighteval/tasks/default_prompts.py, or get more info
about what this function should do in the README.
"""
return Doc(
task_name=task_name,
query=line["question"],
choices=[f" {c}" for c in line["choices"]],
gold_index=line["gold"],
)
Step 3: Choose or Create Metrics
You can either use an existing metric (defined in lighteval.metrics.metrics.Metrics
) or create a custom one.
Using Existing Metrics
from lighteval.metrics import Metrics
# Use an existing metric
metric = Metrics.ACCURACY
Creating Custom Metrics
from lighteval.metrics.utils.metric_utils import SampleLevelMetric
import numpy as np
custom_metric = SampleLevelMetric(
metric_name="my_custom_metric_name",
higher_is_better=True,
category="accuracy",
sample_level_fn=lambda x: x, # How to compute score for one sample
corpus_level_fn=np.mean, # How to aggregate the sample metrics
)
Step 4: Define Your Task
You can define a task with or without subsets using LightevalTaskConfig.
Simple Task (No Subsets)
from lighteval.tasks.lighteval_task import LightevalTaskConfig
# This is how you create a simple task (like HellaSwag) which has one single subset
# attached to it, and one evaluation possible.
task = LightevalTaskConfig(
name="myothertask",
prompt_function=prompt_fn, # Must be defined in the file or imported
suite=["community"],
hf_repo="your_dataset_repo_on_hf",
hf_subset="default",
hf_avail_splits=["train", "test"],
evaluation_splits=["test"],
few_shots_split="train",
few_shots_select="random_sampling_from_train",
metrics=[metric], # Select your metric in Metrics
generation_size=256,
stop_sequence=["\n", "Question:"],
)
Task with Multiple Subsets
If you want to create a task with multiple subsets, add them to the
SAMPLE_SUBSETS
list and create a task for each subset.
SAMPLE_SUBSETS = ["subset1", "subset2", "subset3"] # List of all the subsets to use for this eval
class CustomSubsetTask(LightevalTaskConfig):
def __init__(
self,
name,
hf_subset,
):
super().__init__(
name=name,
hf_subset=hf_subset,
prompt_function=prompt_fn, # Must be defined in the file or imported
hf_repo="your_dataset_name",
metrics=[custom_metric], # Select your metric in Metrics or use your custom_metric
hf_avail_splits=["train", "test"],
evaluation_splits=["test"],
few_shots_split="train",
few_shots_select="random_sampling_from_train",
suite=["community"],
generation_size=256,
stop_sequence=["\n", "Question:"],
)
SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
Step 5: Add Tasks to the Table
Then you need to add your task to the TASKS_TABLE
list.
# STORE YOUR EVALS
# Tasks with subsets:
TASKS_TABLE = SUBSET_TASKS
# Tasks without subsets:
# TASKS_TABLE = [task]
Step 6: Creating a requirement file
If your task has requirements, you need to create a requirement.txt
file with
only the required dependencies so that anyone can run your task.
Running Your Custom Task
Once your file is created, you can run the evaluation with the following command:
lighteval accelerate \
"model_name=HuggingFaceH4/zephyr-7b-beta" \
"community|{custom_task}|{fewshots}" \
--custom-tasks {path_to_your_custom_task_file}
Example Usage
# Run a custom task with zero-shot evaluation
lighteval accelerate \
"model_name=openai-community/gpt2" \
"community|myothertask|0" \
--custom-tasks community_tasks/my_custom_task.py
# Run a custom task with few-shot evaluation
lighteval accelerate \
"model_name=openai-community/gpt2" \
"community|myothertask|3" \
--custom-tasks community_tasks/my_custom_task.py