Lighteval documentation
Using the Python API
Using the Python API
Lighteval can be used from a custom Python script. To evaluate a model, you will need to set up an
EvaluationTracker, PipelineParameters,
a model
or a model_config
,
and a Pipeline.
After that, simply run the pipeline and save the results.
import lighteval
from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.vllm.vllm_model import VLLMModelConfig
from lighteval.pipeline import ParallelismManager, Pipeline, PipelineParameters
from lighteval.utils.imports import is_package_available
if is_package_available("accelerate"):
from datetime import timedelta
from accelerate import Accelerator, InitProcessGroupKwargs
accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=3000))])
else:
accelerator = None
def main():
evaluation_tracker = EvaluationTracker(
output_dir="./results",
save_details=True,
push_to_hub=True,
hub_results_org="your_username", # Replace with your actual username
)
pipeline_params = PipelineParameters(
launcher_type=ParallelismManager.ACCELERATE,
custom_tasks_directory=None, # Set to path if using custom tasks
# Remove the parameter below once your configuration is tested
max_samples=10
)
model_config = VLLMModelConfig(
model_name="HuggingFaceH4/zephyr-7b-beta",
dtype="float16",
)
task = "lighteval|gsm8k|5"
pipeline = Pipeline(
tasks=task,
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config,
)
pipeline.evaluate()
pipeline.save_and_push_results()
pipeline.show_results()
if __name__ == "__main__":
main()
Key Components
EvaluationTracker
The EvaluationTracker
handles logging and saving evaluation results. It can save results locally and optionally push them to the Hugging Face Hub.
PipelineParameters
PipelineParameters
configures how the evaluation pipeline runs, including parallelism settings and task configuration.
Model Configuration
Model configurations define the model to be evaluated, including the model name, data type, and other model-specific parameters. Different backends (VLLM, Transformers, etc.) have their own configuration classes.
Pipeline
The Pipeline
orchestrates the entire evaluation process, taking the tasks, model configuration, and parameters to run the evaluation.
Running Multiple Tasks
You can evaluate on multiple tasks by providing a comma-separated list or a file path:
# Multiple tasks as comma-separated string
tasks = "lighteval|aime24|0,lighteval|aime25|0"
# Or load from a file
tasks = "./path/to/tasks.txt"
pipeline = Pipeline(
tasks=tasks,
# ... other parameters
)
Custom Tasks
To use custom tasks, set the custom_tasks_directory
parameter to the path containing your custom task definitions:
pipeline_params = PipelineParameters(
custom_tasks_directory="./path/to/custom/tasks",
# ... other parameters
)
For more information on creating custom tasks, see the Adding a Custom Task guide.
< > Update on GitHub