Spaces:
Runtime error
Runtime error
"""**Evaluation** chains for grading LLM and Chain outputs. | |
This module contains off-the-shelf evaluation chains for grading the output of | |
LangChain primitives such as language models and chains. | |
**Loading an evaluator** | |
To load an evaluator, you can use the :func:`load_evaluators <langchain.evaluation.loading.load_evaluators>` or | |
:func:`load_evaluator <langchain.evaluation.loading.load_evaluator>` functions with the | |
names of the evaluators to load. | |
.. code-block:: python | |
from langchain.evaluation import load_evaluator | |
evaluator = load_evaluator("qa") | |
evaluator.evaluate_strings( | |
prediction="We sold more than 40,000 units last week", | |
input="How many units did we sell last week?", | |
reference="We sold 32,378 units", | |
) | |
The evaluator must be one of :class:`EvaluatorType <langchain.evaluation.schema.EvaluatorType>`. | |
**Datasets** | |
To load one of the LangChain HuggingFace datasets, you can use the :func:`load_dataset <langchain.evaluation.loading.load_dataset>` function with the | |
name of the dataset to load. | |
.. code-block:: python | |
from langchain.evaluation import load_dataset | |
ds = load_dataset("llm-math") | |
**Some common use cases for evaluation include:** | |
- Grading the accuracy of a response against ground truth answers: :class:`QAEvalChain <langchain.evaluation.qa.eval_chain.QAEvalChain>` | |
- Comparing the output of two models: :class:`PairwiseStringEvalChain <langchain.evaluation.comparison.eval_chain.PairwiseStringEvalChain>` or :class:`LabeledPairwiseStringEvalChain <langchain.evaluation.comparison.eval_chain.LabeledPairwiseStringEvalChain>` when there is additionally a reference label. | |
- Judging the efficacy of an agent's tool usage: :class:`TrajectoryEvalChain <langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain>` | |
- Checking whether an output complies with a set of criteria: :class:`CriteriaEvalChain <langchain.evaluation.criteria.eval_chain.CriteriaEvalChain>` or :class:`LabeledCriteriaEvalChain <langchain.evaluation.criteria.eval_chain.LabeledCriteriaEvalChain>` when there is additionally a reference label. | |
- Computing semantic difference between a prediction and reference: :class:`EmbeddingDistanceEvalChain <langchain.evaluation.embedding_distance.base.EmbeddingDistanceEvalChain>` or between two predictions: :class:`PairwiseEmbeddingDistanceEvalChain <langchain.evaluation.embedding_distance.base.PairwiseEmbeddingDistanceEvalChain>` | |
- Measuring the string distance between a prediction and reference :class:`StringDistanceEvalChain <langchain.evaluation.string_distance.base.StringDistanceEvalChain>` or between two predictions :class:`PairwiseStringDistanceEvalChain <langchain.evaluation.string_distance.base.PairwiseStringDistanceEvalChain>` | |
**Low-level API** | |
These evaluators implement one of the following interfaces: | |
- :class:`StringEvaluator <langchain.evaluation.schema.StringEvaluator>`: Evaluate a prediction string against a reference label and/or input context. | |
- :class:`PairwiseStringEvaluator <langchain.evaluation.schema.PairwiseStringEvaluator>`: Evaluate two prediction strings against each other. Useful for scoring preferences, measuring similarity between two chain or llm agents, or comparing outputs on similar inputs. | |
- :class:`AgentTrajectoryEvaluator <langchain.evaluation.schema.AgentTrajectoryEvaluator>` Evaluate the full sequence of actions taken by an agent. | |
These interfaces enable easier composability and usage within a higher level evaluation framework. | |
""" # noqa: E501 | |
from langchain.evaluation.agents import TrajectoryEvalChain | |
from langchain.evaluation.comparison import ( | |
LabeledPairwiseStringEvalChain, | |
PairwiseStringEvalChain, | |
) | |
from langchain.evaluation.criteria import ( | |
Criteria, | |
CriteriaEvalChain, | |
LabeledCriteriaEvalChain, | |
) | |
from langchain.evaluation.embedding_distance import ( | |
EmbeddingDistance, | |
EmbeddingDistanceEvalChain, | |
PairwiseEmbeddingDistanceEvalChain, | |
) | |
from langchain.evaluation.exact_match.base import ExactMatchStringEvaluator | |
from langchain.evaluation.loading import load_dataset, load_evaluator, load_evaluators | |
from langchain.evaluation.parsing.base import ( | |
JsonEqualityEvaluator, | |
JsonValidityEvaluator, | |
) | |
from langchain.evaluation.parsing.json_distance import JsonEditDistanceEvaluator | |
from langchain.evaluation.parsing.json_schema import JsonSchemaEvaluator | |
from langchain.evaluation.qa import ContextQAEvalChain, CotQAEvalChain, QAEvalChain | |
from langchain.evaluation.regex_match.base import RegexMatchStringEvaluator | |
from langchain.evaluation.schema import ( | |
AgentTrajectoryEvaluator, | |
EvaluatorType, | |
PairwiseStringEvaluator, | |
StringEvaluator, | |
) | |
from langchain.evaluation.scoring import ( | |
LabeledScoreStringEvalChain, | |
ScoreStringEvalChain, | |
) | |
from langchain.evaluation.string_distance import ( | |
PairwiseStringDistanceEvalChain, | |
StringDistance, | |
StringDistanceEvalChain, | |
) | |
__all__ = [ | |
"EvaluatorType", | |
"ExactMatchStringEvaluator", | |
"RegexMatchStringEvaluator", | |
"PairwiseStringEvalChain", | |
"LabeledPairwiseStringEvalChain", | |
"QAEvalChain", | |
"CotQAEvalChain", | |
"ContextQAEvalChain", | |
"StringEvaluator", | |
"PairwiseStringEvaluator", | |
"TrajectoryEvalChain", | |
"CriteriaEvalChain", | |
"Criteria", | |
"EmbeddingDistance", | |
"EmbeddingDistanceEvalChain", | |
"PairwiseEmbeddingDistanceEvalChain", | |
"StringDistance", | |
"StringDistanceEvalChain", | |
"PairwiseStringDistanceEvalChain", | |
"LabeledCriteriaEvalChain", | |
"load_evaluators", | |
"load_evaluator", | |
"load_dataset", | |
"AgentTrajectoryEvaluator", | |
"ScoreStringEvalChain", | |
"LabeledScoreStringEvalChain", | |
"JsonValidityEvaluator", | |
"JsonEqualityEvaluator", | |
"JsonEditDistanceEvaluator", | |
"JsonSchemaEvaluator", | |
] | |