Evaluate documentation

Loading methods

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.4.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Loading methods

Methods for listing and loading evaluation modules:

List

evaluate.list_evaluation_modules

< >

( module_type = None include_community = True with_details = False )

Parameters

  • module_type (str, optional, defaults to None) — Type of evaluation modules to list. Has to be one of 'metric', 'comparison', or 'measurement'. If None, all types are listed.
  • include_community (bool, optional, defaults to True) — Include community modules in the list.
  • with_details (bool, optional, defaults to False) — Return the full details on the metrics instead of only the ID.

List all evaluation modules available on the Hugging Face Hub.

Example:

>>> from evaluate import list_evaluation_modules
>>> list_evaluation_modules(module_type="metric")

Load

evaluate.load

< >

( path: str config_name: typing.Optional[str] = None module_type: typing.Optional[str] = None process_id: int = 0 num_process: int = 1 cache_dir: typing.Optional[str] = None experiment_id: typing.Optional[str] = None keep_in_memory: bool = False download_config: typing.Optional[datasets.download.download_config.DownloadConfig] = None download_mode: typing.Optional[datasets.download.download_manager.DownloadMode] = None revision: typing.Union[str, datasets.utils.version.Version, NoneType] = None **init_kwargs )

Parameters

  • path (str) — Path to the evaluation processing script with the evaluation builder. Can be either:
    • a local path to processing script or the directory containing the script (if the script has the same name as the directory), e.g. './metrics/rouge' or './metrics/rouge/rouge.py'
    • a evaluation module identifier on the HuggingFace evaluate repo e.g. 'rouge' or 'bleu' that are in either 'metrics/', 'comparisons/', or 'measurements/' depending on the provided module_type
  • config_name (str, optional) — Selecting a configuration for the metric (e.g. the GLUE metric has a configuration for each subset).
  • module_type (str, default 'metric') — Type of evaluation module, can be one of 'metric', 'comparison', or 'measurement'.
  • process_id (int, optional) — For distributed evaluation: id of the process.
  • num_process (int, optional) — For distributed evaluation: total number of processes.
  • cache_dir (str, optional) — Path to store the temporary predictions and references (default to ~/.cache/huggingface/evaluate/).
  • experiment_id (str) — A specific experiment id. This is used if several distributed evaluations share the same file system. This is useful to compute metrics in distributed setups (in particular non-additive metrics like F1).
  • keep_in_memory (bool) — Whether to store the temporary results in memory (defaults to False).
  • download_config (~evaluate.DownloadConfig, optional) — Specific download configuration parameters.
  • download_mode (DownloadMode, defaults to REUSE_DATASET_IF_EXISTS) — Download/generate mode.
  • revision (Union[str, evaluate.Version], optional) — If specified, the module will be loaded from the datasets repository at this version. By default it is set to the local version of the lib. Specifying a version that is different from your local version of the lib might cause compatibility issues.

Load a EvaluationModule.

Example:

>>> from evaluate import load
>>> accuracy = evaluate.load("accuracy")