Models

class lighteval.models.abstract_model.LightevalModel

< source >

( )

cleanup

< source >

( )

Clean up operations if needed, such as closing an endpoint.

greedy_until

< source >

( requests: list ) → list[GenerativeResponse]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
disable_tqdm (bool, optional) — Whether to disable the progress bar. Defaults to False.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerativeResponse]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

greedy_until_multi_turn

< source >

( requests: list )

Generates responses using a greedy decoding strategy until certain ending conditions are met.

loglikelihood

< source >

( requests: list )

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

loglikelihood_rolling

< source >

( requests: list )

This function is used to compute the log likelihood of the context for perplexity metrics.

loglikelihood_single_token

< source >

( requests: list )

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

tok_encode_pair

< source >

( context continuation pairwise: bool = False ) → Tuple[TokenSequence, TokenSequence]

Parameters

context (str) — The context string to be encoded.
continuation (str) — The continuation string to be encoded.
pairwise (bool) — If True, encode context and continuation separately. If False, encode them together and then split.

Returns

Tuple[TokenSequence, TokenSequence]

A tuple containing the encoded context and continuation.

Encodes a context, continuation pair by taking care of the spaces in between.

The advantage of pairwise is: 1) It better aligns with how LLM predicts tokens 2) Works in case len(tok(context,cont)) != len(tok(context)) + len(tok(continuation)). E.g this can happen for chinese if no space is used between context/continuation

class lighteval.models.transformers.transformers_model.TransformersModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) model_name: str tokenizer: str | None = None subfolder: str | None = None revision: str = 'main' batch_size: typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None generation_size: typing.Annotated[int, Gt(gt=0)] = 256 max_length: typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None add_special_tokens: bool = True model_parallel: bool | None = None dtype: str | None = None device: typing.Union[int, str] = 'cuda' trust_remote_code: bool = False use_chat_template: bool = False compile: bool = False multichoice_continuations_start_space: bool | None = None pairwise_tokenization: bool = False )

Parameters

model_name (str) — HuggingFace Hub model ID name or the path to a pre-trained model to load. This is effectively the pretrained_model_name_or_path argument of from_pretrained in the HuggingFace transformers API.
accelerator (Accelerator) — accelerator to use for model training.
tokenizer (Optional[str]) — HuggingFace Hub tokenizer ID that will be used for tokenization.
multichoice_continuations_start_space (Optional[bool]) — Whether to add a space at the start of each continuation in multichoice generation. For example, context: “What is the capital of France?” and choices: “Paris”, “London”. Will be tokenized as: “What is the capital of France? Paris” and “What is the capital of France? London”. True adds a space, False strips a space, None does nothing
pairwise_tokenization (bool) — Whether to tokenize the context and continuation as separately or together.
subfolder (Optional[str]) — The subfolder within the model repository.
revision (str) — The revision of the model.
batch_size (int) — The batch size for model training.
max_gen_toks (Optional[int]) — The maximum number of tokens to generate.
max_length (Optional[int]) — The maximum length of the generated output.
add_special_tokens (bool, optional, defaults to True) — Whether to add special tokens to the input sequences. If None, the default value will be set to True for seq2seq models (e.g. T5) and False for causal models.
model_parallel (bool, optional, defaults to None) — True/False: force to use or not the accelerate library to load a large model across multiple devices. Default: None which corresponds to comparing the number of processes with the number of GPUs. If it’s smaller => model-parallelism, else not.
dtype (Union[str, torch.dtype], optional, defaults to None) —): Converts the model weights to dtype, if specified. Strings get converted to torch.dtype objects (e.g. float16 -> torch.float16). Use dtype="auto" to derive the type from the model’s weights.
device (Union[int, str]) — device to use for model training.
quantization_config (Optional[BitsAndBytesConfig]) — quantization configuration for the model, manually provided to load a normally floating point model at a quantized precision. Needed for 4-bit and 8-bit precision.
trust_remote_code (bool) — Whether to trust remote code during model loading.
generation_parameters (GenerationParameters) — Range of parameters which will affect the generation.
generation_config (GenerationConfig) — GenerationConfig object (only passed during manual creation)

Base configuration class for models.

Methods: post_init(): Performs post-initialization checks on the configuration. _init_configs(model_name, env_config): Initializes the model configuration. init_configs(env_config): Initializes the model configuration using the environment configuration. get_model_sha(): Retrieves the SHA of the model.

class lighteval.models.transformers.transformers_model.TransformersModel

< source >

( config: TransformersModelConfig )

greedy_until

< source >

( requests: list ) → list[GenerativeResponse]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerativeResponse]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

init_model_parallel

< source >

( model_parallel: bool | None = None )

Compute all the parameters related to model_parallel

loglikelihood

< source >

( requests: list ) → list[Tuple[float, bool]]

Parameters

requests (list[Tuple[str, dict]]) — description

Returns

list[Tuple[float, bool]]

description

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

loglikelihood_single_token

< source >

( requests: list ) → list[Tuple[float, bool]]

Parameters

requests (list[Tuple[str, dict]]) — description

Returns

list[Tuple[float, bool]]

description

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

pad_and_gather

< source >

( output_tensor: Tensor drop_last_samples: bool = True num_samples: int = None ) → torch.Tensor

Parameters

output_tensor (torch.Tensor) — The output tensor to be padded.
drop_last_samples (bool, optional) — Whether to drop the last samples during gathering.
Last samples are dropped when the number of samples is not divisible by the number of processes. — Defaults to True.

Returns

torch.Tensor

The padded output tensor and the gathered length tensor.

Pads the output_tensor to the maximum length and gathers the lengths across processes.

prepare_batch_logprob

< source >

( batch: list padding_length: int max_context: typing.Optional[int] = None single_token: bool = False )

Tokenize a batch of inputs and return also the length, truncations and padding. This step is done manually since we tokenize log probability inputs together with their continuation, to manage possible extra spaces added at the start by tokenizers, see tok_encode_pair.

class lighteval.models.transformers.adapter_model.AdapterModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) model_name: str tokenizer: str | None = None subfolder: str | None = None revision: str = 'main' batch_size: typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None generation_size: typing.Annotated[int, Gt(gt=0)] = 256 max_length: typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None add_special_tokens: bool = True model_parallel: bool | None = None dtype: str | None = None device: typing.Union[int, str] = 'cuda' trust_remote_code: bool = False use_chat_template: bool = False compile: bool = False multichoice_continuations_start_space: bool | None = None pairwise_tokenization: bool = False base_model: str adapter_weights: bool )

class lighteval.models.transformers.adapter_model.AdapterModel

< source >

( config: TransformersModelConfig )

class lighteval.models.transformers.delta_model.DeltaModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) model_name: str tokenizer: str | None = None subfolder: str | None = None revision: str = 'main' batch_size: typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None generation_size: typing.Annotated[int, Gt(gt=0)] = 256 max_length: typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None add_special_tokens: bool = True model_parallel: bool | None = None dtype: str | None = None device: typing.Union[int, str] = 'cuda' trust_remote_code: bool = False use_chat_template: bool = False compile: bool = False multichoice_continuations_start_space: bool | None = None pairwise_tokenization: bool = False base_model: str delta_weights: bool )

class lighteval.models.transformers.delta_model.DeltaModel

< source >

( config: TransformersModelConfig )

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) endpoint_name: str | None = None model_name: str | None = None reuse_existing: bool = False accelerator: str = 'gpu' dtype: str | None = None vendor: str = 'aws' region: str = 'us-east-1' instance_size: str | None = None instance_type: str | None = None framework: str = 'pytorch' endpoint_type: str = 'protected' add_special_tokens: bool = True revision: str = 'main' namespace: str | None = None image_url: str | None = None env_vars: dict | None = None )

class lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) model_name: str add_special_tokens: bool = True )

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModel

< source >

( config: typing.Union[lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig, lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig] )

InferenceEndpointModels can be used both with the free inference client, or with inference endpoints, which will use text-generation-inference to deploy your model for the duration of the evaluation.

class lighteval.models.endpoints.tgi_model.TGIModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) inference_server_address: str | None inference_server_auth: str | None model_name: str | None )

class lighteval.models.endpoints.tgi_model.ModelClient

< source >

( config: TGIModelConfig )

class lighteval.models.custom.custom_model.CustomModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) model_name: str model_definition_file_path: str )

Parameters

model (str) — An identifier for the model. This can be used to track which model was evaluated in the results and logs.
model_definition_file_path (str) — Path to a Python file containing the custom model implementation. This file must define exactly one class that inherits from LightevalModel. The class should implement all required methods from the LightevalModel interface.

Configuration class for loading custom model implementations in Lighteval.

This config allows users to define and load their own model implementations by specifying a Python file containing a custom model class that inherits from LightevalModel.

The custom model file should contain exactly one class that inherits from LightevalModel. This class will be automatically detected and instantiated when loading the model.

Example usage:

# Define config
config = CustomModelConfig(
    model="my-custom-model",
    model_definition_file_path="path/to/my_model.py"
)

# Example custom model file (my_model.py):
from lighteval.models.abstract_model import LightevalModel

class MyCustomModel(LightevalModel):
    def __init__(self, config, env_config):
        super().__init__(config, env_config)
        # Custom initialization...

    def greedy_until(self, *args, **kwargs):
        # Custom generation logic...
        pass

An example of a custom model can be found in examples/custom_models/google_translate_model.py.

Notes:

The custom model class must inherit from LightevalModel and implement all required methods
Only one class inheriting from LightevalModel should be defined in the file
The model file is dynamically loaded at runtime, so ensure all dependencies are available
Exercise caution when loading custom model files as they can execute arbitrary code

class lighteval.models.endpoints.openai_model.OpenAIClient

< source >

( config: OpenAIModelConfig env_config )

greedy_until

< source >

( requests: list override_bs: typing.Optional[int] = None ) → list[GenerativeResponse]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerativeResponse]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

class lighteval.models.vllm.vllm_model.VLLMModelConfig

< source >

( generation_parameters: GenerationParameters = GenerationParameters(early_stopping=None, repetition_penalty=None, frequency_penalty=None, length_penalty=None, presence_penalty=None, max_new_tokens=None, min_new_tokens=None, seed=None, stop_tokens=None, temperature=None, top_k=None, min_p=None, top_p=None, truncate_prompt=None, response_format=None) model_name: str revision: str = 'main' dtype: str = 'bfloat16' tensor_parallel_size: typing.Annotated[int, Gt(gt=0)] = 1 data_parallel_size: typing.Annotated[int, Gt(gt=0)] = 1 pipeline_parallel_size: typing.Annotated[int, Gt(gt=0)] = 1 gpu_memory_utilization: typing.Annotated[float, Ge(ge=0)] = 0.9 max_model_length: typing.Optional[typing.Annotated[int, Gt(gt=0)]] = None swap_space: typing.Annotated[int, Gt(gt=0)] = 4 seed: typing.Annotated[int, Ge(ge=0)] = 1234 trust_remote_code: bool = False use_chat_template: bool = False add_special_tokens: bool = True multichoice_continuations_start_space: bool = True pairwise_tokenization: bool = False max_num_seqs: typing.Annotated[int, Gt(gt=0)] = 128 max_num_batched_tokens: typing.Annotated[int, Gt(gt=0)] = 2048 subfolder: str | None = None )

class lighteval.models.vllm.vllm_model.VLLMModel

< source >

( config: VLLMModelConfig )

greedy_until

< source >

( requests: list override_bs: typing.Optional[int] = None ) → list[GenerateReturn]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerateReturn]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

Lighteval

Models

Model

LightevalModel

class lighteval.models.abstract_model.LightevalModel

cleanup

greedy_until

greedy_until_multi_turn

loglikelihood

loglikelihood_rolling

loglikelihood_single_token

tok_encode_pair

Accelerate and Transformers Models

TransformersModel

class lighteval.models.transformers.transformers_model.TransformersModelConfig

class lighteval.models.transformers.transformers_model.TransformersModel

greedy_until

init_model_parallel

loglikelihood

loglikelihood_single_token

pad_and_gather

prepare_batch_logprob

AdapterModel

class lighteval.models.transformers.adapter_model.AdapterModelConfig

class lighteval.models.transformers.adapter_model.AdapterModel

DeltaModel

class lighteval.models.transformers.delta_model.DeltaModelConfig

class lighteval.models.transformers.delta_model.DeltaModel

Endpoints-based Models

InferenceEndpointModel

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig

class lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModel

TGI ModelClient

class lighteval.models.endpoints.tgi_model.TGIModelConfig

class lighteval.models.endpoints.tgi_model.ModelClient

Custom Model

class lighteval.models.custom.custom_model.CustomModelConfig

Open AI Models

class lighteval.models.endpoints.openai_model.OpenAIClient

greedy_until

VLLM Model

VLLMModel

class lighteval.models.vllm.vllm_model.VLLMModelConfig

class lighteval.models.vllm.vllm_model.VLLMModel

greedy_until