Spaces:
Runtime error
title: Auto Evaluator
emoji: ':brain'
colorFrom: blue
colorTo: yellow
sdk: streamlit
sdk_version: 1.19.0
app_file: app.py
pinned: false
license: mit
Auto-evaluator
:brain: :memo:
This is a lightweight evaluation tool for question-answering using Langchain
to:
Ask the user to input a set of documents of interest
Apply an LLM (
GPT-3.5-turbo
) to auto-generatequestion
-answer
pairs from these docsGenerate a question-answering chain with a specified set of UI-chosen configurations
Use the chain to generate a response to each
question
Use an LLM (
GPT-3.5-turbo
) to score the response relative to theanswer
Explore scoring across various chain configurations
Run as Streamlit app
pip install -r requirements.txt
streamlit run auto-evaluator.py
Inputs
num_eval_questions
- Number of questions to auto-generate (if the user does not supply an eval set)
split_method
- Method for text splitting
chunk_chars
- Chunk size for text splitting
overlap
- Chunk overlap for text splitting
embeddings
- Embedding method for chunks
retriever_type
- Chunk retrieval method
num_neighbors
- Neighbors for retrieval
model
- LLM for summarization of retrieved chunks
grade_prompt
- Prompt choice for model self-grading
Blog
https://blog.langchain.dev/auto-eval-of-question-answering-tasks/
UI
Hosted app
See: https://github.com/langchain-ai/auto-evaluator
And: https://autoevaluator.langchain.com/
Disclaimer
You will need an OpenAI API key with access to `GPT-4` and an Anthropic API key to take advantage of all of the default dashboard model settings. However, additional models (e.g., from Hugging Face) can be easily added to the app.