Evaluate documentation

Main classes

You are viewing v0.1.2 version. A newer version v0.4.0 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Main classes


class evaluate.EvaluationModuleInfo

< >

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = <factory> homepage: str = <factory> license: str = <factory> codebase_urls: typing.List[str] = <factory> reference_urls: typing.List[str] = <factory> streamable: bool = False format: typing.Optional[str] = None module_type: str = 'metric' metric_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

Information about a metric.

EvaluationModuleInfo documents a metric, including its name, version, and features. See the constructor arguments and properties for a full list.

Note: Not all fields are known on construction and may be updated later.


< >

( metric_info_dir )

Create EvaluationModuleInfo from the JSON file in metric_info_dir.


< >

( metric_info_dir )

Write EvaluationModuleInfo as JSON to metric_info_dir. Also save the license separately in LICENCE.


The base class Metric implements a Metric backed by one or several Dataset.

class evaluate.EvaluationModule

< >

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )


  • config_name (str) — This is used to define a hash specific to a module computation script and prevents the module’s data to be overridden when the module loading script is modified.
  • keep_in_memory (bool) — keep all predictions and references in memory. Not possible in distributed settings.
  • cache_dir (str) — Path to a directory in which temporary prediction/references data will be stored. The data directory should be located on a shared file-system in distributed setups.
  • num_process (int) — specify the total number of nodes in a distributed settings. This is useful to compute module in distributed setups (in particular non-additive modules like F1).
  • process_id (int) — specify the id of the current process in a distributed setup (between 0 and num_process-1) This is useful to compute module in distributed setups (in particular non-additive metrics like F1).
  • seed (int, optional) — If specified, this will temporarily set numpy’s random seed when evaluate.EvaluationModule.compute() is run.
  • experiment_id (str) — A specific experiment id. This is used if several distributed evaluations share the same file system. This is useful to compute module in distributed setups (in particular non-additive metrics like F1).
  • max_concurrent_cache_files (int) — Max number of concurrent module cache files (default 10000).
  • timeout (Union[int, float]) — Timeout in second for distributed setting synchronization.

A EvaluationModule is the base class and common API for metrics, comparisons, and measurements.


< >

( prediction = None reference = None **kwargs )


  • prediction (list/array/tensor, optional) — Predictions.
  • reference (list/array/tensor, optional) — References.

Add one prediction and reference for the evaluation module’s stack.


< >

( predictions = None references = None **kwargs )


  • predictions (list/array/tensor, optional) — Predictions.
  • references (list/array/tensor, optional) — References.

Add a batch of predictions and references for the evaluation module’s stack.


< >

( predictions = None references = None **kwargs )


  • predictions (list/array/tensor, optional) — Predictions.
  • references (list/array/tensor, optional) — References.
  • **kwargs (optional) — Keyword arguments that will be forwarded to the evaluation module _compute method (see details in the docstring).

Compute the evaluation module.

Usage of positional arguments is not allowed to prevent mistakes.


< >

( download_config: typing.Optional[evaluate.utils.file_utils.DownloadConfig] = None dl_manager: typing.Optional[datasets.download.download_manager.DownloadManager] = None )


  • download_config (DownloadConfig, optional) — Specific download configuration parameters.
  • dl_manager (DownloadManager, optional) — Specific download manager to use.

Downloads and prepares dataset for reading.