auto-evaluator / README.md
rlancemartin's picture
Add configuration YAML block at the top of the README.md
166b383
metadata
title: Auto Evaluator
emoji: ':brain'
colorFrom: blue
colorTo: yellow
sdk: streamlit
sdk_version: 1.19.0
app_file: app.py
pinned: false
license: mit

Auto-evaluator :brain: :memo:

This is a lightweight evaluation tool for question-answering using Langchain to:

  • Ask the user to input a set of documents of interest

  • Apply an LLM (GPT-3.5-turbo) to auto-generate question-answer pairs from these docs

  • Generate a question-answering chain with a specified set of UI-chosen configurations

  • Use the chain to generate a response to each question

  • Use an LLM (GPT-3.5-turbo) to score the response relative to the answer

  • Explore scoring across various chain configurations

Run as Streamlit app

pip install -r requirements.txt

streamlit run auto-evaluator.py

Inputs

num_eval_questions - Number of questions to auto-generate (if the user does not supply an eval set)

split_method - Method for text splitting

chunk_chars - Chunk size for text splitting

overlap - Chunk overlap for text splitting

embeddings - Embedding method for chunks

retriever_type - Chunk retrieval method

num_neighbors - Neighbors for retrieval

model - LLM for summarization of retrieved chunks

grade_prompt - Prompt choice for model self-grading

Blog

https://blog.langchain.dev/auto-eval-of-question-answering-tasks/

UI

image

Hosted app

See: https://github.com/langchain-ai/auto-evaluator

And: https://autoevaluator.langchain.com/

Disclaimer

You will need an OpenAI API key with access to `GPT-4` and an Anthropic API key to take advantage of all of the default dashboard model settings. However, additional models (e.g., from Hugging Face) can be easily added to the app.