Main classes
EvaluationModuleInfo
class evaluate.EvaluationModuleInfo
< source >( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = <factory> homepage: str = <factory> license: str = <factory> codebase_urls: typing.List[str] = <factory> reference_urls: typing.List[str] = <factory> streamable: bool = False format: typing.Optional[str] = None module_type: str = 'metric' metric_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )
Information about a metric.
EvaluationModuleInfo
documents a metric, including its name, version, and features.
See the constructor arguments and properties for a full list.
Note: Not all fields are known on construction and may be updated later.
Create EvaluationModuleInfo from the JSON file in metric_info_dir
.
Write EvaluationModuleInfo
as JSON to metric_info_dir
.
Also save the license separately in LICENCE.
EvaluationModule
The base class Metric
implements a Metric backed by one or several Dataset
.
class evaluate.EvaluationModule
< source >( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )
Parameters
-
config_name (
str
) — This is used to define a hash specific to a module computation script and prevents the module’s data to be overridden when the module loading script is modified. -
keep_in_memory (
bool
) — keep all predictions and references in memory. Not possible in distributed settings. -
cache_dir (
str
) — Path to a directory in which temporary prediction/references data will be stored. The data directory should be located on a shared file-system in distributed setups. -
num_process (
int
) — specify the total number of nodes in a distributed settings. This is useful to compute module in distributed setups (in particular non-additive modules like F1). -
process_id (
int
) — specify the id of the current process in a distributed setup (between 0 and num_process-1) This is useful to compute module in distributed setups (in particular non-additive metrics like F1). -
seed (
int
, optional) — If specified, this will temporarily set numpy’s random seed when evaluate.EvaluationModule.compute() is run. -
experiment_id (
str
) — A specific experiment id. This is used if several distributed evaluations share the same file system. This is useful to compute module in distributed setups (in particular non-additive metrics like F1). -
max_concurrent_cache_files (
int
) — Max number of concurrent module cache files (default 10000). -
timeout (
Union[int, float]
) — Timeout in second for distributed setting synchronization.
A EvaluationModule is the base class and common API for metrics, comparisons, and measurements.
add
< source >( prediction = None reference = None **kwargs )
Add one prediction and reference for the evaluation module’s stack.
add_batch
< source >( predictions = None references = None **kwargs )
Add a batch of predictions and references for the evaluation module’s stack.
compute
< source >( predictions = None references = None **kwargs )
Compute the evaluation module.
Usage of positional arguments is not allowed to prevent mistakes.
download_and_prepare
< source >( download_config: typing.Optional[evaluate.utils.file_utils.DownloadConfig] = None dl_manager: typing.Optional[datasets.utils.download_manager.DownloadManager] = None )
Downloads and prepares dataset for reading.