Main classes

EvaluationModuleInfo

class evaluate.EvaluationModuleInfo

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = <factory> homepage: str = <factory> license: str = <factory> codebase_urls: typing.List[str] = <factory> reference_urls: typing.List[str] = <factory> streamable: bool = False format: typing.Optional[str] = None module_type: str = 'metric' metric_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

Information about a metric.

EvaluationModuleInfo documents a metric, including its name, version, and features. See the constructor arguments and properties for a full list.

Note: Not all fields are known on construction and may be updated later.

from_directory

< source >

( metric_info_dir )

Create EvaluationModuleInfo from the JSON file in metric_info_dir.

write_to_directory

< source >

( metric_info_dir )

Write EvaluationModuleInfo as JSON to metric_info_dir. Also save the license separately in LICENCE.

EvaluationModule

The base class Metric implements a Metric backed by one or several Dataset.

class evaluate.EvaluationModule

< source >

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

Parameters

config_name (str) — This is used to define a hash specific to a module computation script and prevents the module’s data to be overridden when the module loading script is modified.
keep_in_memory (bool) — keep all predictions and references in memory. Not possible in distributed settings.
cache_dir (str) — Path to a directory in which temporary prediction/references data will be stored. The data directory should be located on a shared file-system in distributed setups.
num_process (int) — specify the total number of nodes in a distributed settings. This is useful to compute module in distributed setups (in particular non-additive modules like F1).
process_id (int) — specify the id of the current process in a distributed setup (between 0 and num_process-1) This is useful to compute module in distributed setups (in particular non-additive metrics like F1).
seed (int, optional) — If specified, this will temporarily set numpy’s random seed when evaluate.EvaluationModule.compute() is run.
experiment_id (str) — A specific experiment id. This is used if several distributed evaluations share the same file system. This is useful to compute module in distributed setups (in particular non-additive metrics like F1).
max_concurrent_cache_files (int) — Max number of concurrent module cache files (default 10000).
timeout (Union[int, float]) — Timeout in second for distributed setting synchronization.

A EvaluationModule is the base class and common API for metrics, comparisons, and measurements.

add

< source >

( prediction = None reference = None **kwargs )

Parameters

prediction (list/array/tensor, optional) — Predictions.
reference (list/array/tensor, optional) — References.

Add one prediction and reference for the evaluation module’s stack.

add_batch

< source >

( predictions = None references = None **kwargs )

Parameters

predictions (list/array/tensor, optional) — Predictions.
references (list/array/tensor, optional) — References.

Add a batch of predictions and references for the evaluation module’s stack.

compute

< source >

( predictions = None references = None **kwargs )

Parameters

predictions (list/array/tensor, optional) — Predictions.
references (list/array/tensor, optional) — References.
**kwargs (optional) — Keyword arguments that will be forwarded to the evaluation module _compute method (see details in the docstring).

Compute the evaluation module.

Usage of positional arguments is not allowed to prevent mistakes.

download_and_prepare

< source >

( download_config: typing.Optional[evaluate.utils.file_utils.DownloadConfig] = None dl_manager: typing.Optional[datasets.utils.download_manager.DownloadManager] = None )

Parameters

download_config (DownloadConfig, optional) — Specific download configuration parameters.
dl_manager (DownloadManager, optional) — Specific download manager to use.

Downloads and prepares dataset for reading.

Evaluate

Main classes

EvaluationModuleInfo

class evaluate.EvaluationModuleInfo

from_directory

write_to_directory

EvaluationModule

class evaluate.EvaluationModule

add

add_batch

compute

download_and_prepare