Lighteval documentation
Use VLLM as backend
Use VLLM as backend
Lighteval allows you to use vllm
as backend allowing great speedups.
To use, simply change the model_args
to reflect the arguments you want to pass to vllm.
lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
"leaderboard|truthfulqa:mc|0|0"
vllm
is able to distribute the model across multiple GPUs using data
parallelism, pipeline parallelism or tensor parallelism.
You can choose the parallelism method by setting in the the model_args
.
For example if you have 4 GPUs you can split it across using tensor_parallelism
:
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
"leaderboard|truthfulqa:mc|0|0"
Or, if your model fits on a single GPU, you can use data_parallelism
to speed up the evaluation:
lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
"leaderboard|truthfulqa:mc|0|0"
Use a config file
For more advanced configurations, you can use a config file for the model.
An example of a config file is shown below and can be found at examples/model_configs/vllm_model_config.yaml
.
lighteval vllm \
"examples/model_configs/vllm_model_config.yaml" \
"leaderboard|truthfulqa:mc|0|0"
model: # Model specific parameters
base_params:
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,revision=main,dtype=bfloat16" # Model args that you would pass in the command line
generation: # Generation specific parameters
temperature: 0.3
repetition_penalty: 1.0
frequency_penalty: 0.0
presence_penalty: 0.0
seed: 42
top_k: 0
min_p: 0.0
top_p: 0.9
In the case of OOM issues, you might need to reduce the context size of the
model as well as reduce the gpu_memory_utilization
parameter.
Dynamically changing the metric configuration
For special kinds of metrics like Pass@K
or LiveCodeBench’s codegen
metric, you may need to pass specific values like the number of
generations. This can be done in the yaml
file in the following way:
model: # Model specific parameters
base_params:
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,revision=main,dtype=bfloat16" # Model args that you would pass in the command line
generation: # Generation specific parameters
temperature: 0.3
repetition_penalty: 1.0
frequency_penalty: 0.0
presence_penalty: 0.0
seed: 42
top_k: 0
min_p: 0.0
top_p: 0.9
metric_options: # Optional metric arguments
codegen_pass@1:16:
num_samples: 16
An optional key metric_options
can be passed in the yaml file,
using the name of the metric or metrics, as defined in the Metric.metric_name
.
In this case, the codegen_pass@1:16
metric defined in our tasks will have the num_samples
updated to 16,
independently of the number defined by default.