Spaces:

MCP-1st-Birthday
/

ragmint-mcp-server

Running

App Files Files Community

ragmint-mcp-server / README.md

André Oliveira

added demo on youtube

56014e8 16 days ago

preview code

raw

history blame contribute delete

17.2 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Ragmint MCP Server
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
license: apache-2.0
pinned: true
short_description: MCP server for Ragmint with RAG pipeline optimization
tags:
  - building-mcp-track-enterprise
  - mcp
  - rag
  - llm
  - gradio
  - bayesian-optimization
  - embeddings
  - vector-search
  - gemini
  - retrievers
  - python-library

Ragmint MCP Server

Ragmint Banner

Gradio-based MCP server for Ragmint, enabling Retrieval-Augmented Generation (RAG) pipeline optimization and tuning via an MCP interface.

🧩 Overview

Ragmint MCP Server exposes the full power of Ragmint, a modular Python library for evaluating, optimizing, and tuning RAG pipelines, through a Multimodal Control Plane (MCP). This allows external clients (like Claude Desktop or Cursor) to run experiments and tune RAG parameters programmatically.

Ragmint

Ragmint (Retrieval-Augmented Generation Model Inspection & Tuning) is a modular Python library for evaluating, optimizing, and tuning RAG pipelines. It’s designed for developers and researchers who want automated hyperparameter optimization, retriever selection, embedding tuning, explainability, and reproducible experiment tracking.

Features exposed via MCP:

✅ Automated hyperparameter optimization (Grid, Random, Bayesian via Optuna).
🤖 Auto-RAG Tuner for dynamic retriever–embedding recommendations.
🧮 Validation QA generation for corpora without labeled data.
📦 Chunking, embeddings, retrievers, rerankers configuration.
⚙️ Full RAG pipeline control programmatically.

🚀 Quick Start

Installation

pip install -r requirements.txt

Running the MCP Server

python app.py

The server will expose MCP-compatible endpoints, allowing clients to:

Perform optimization experiments.
Automatically autotune pipelines.
Generate validation QA sets with LLM.

Environment Variables

Set API keys for LLMs used in explainability and QA generation:

export GOOGLE_API_KEY="your_gemini_key"

🧠 MCP Usage

Ragmint MCP Server provides Python-callable interfaces for programmatic control. You can find an example of MCP usage in the Ragmint MCP Server Space on Hugging Face.

🔤 Supported Embeddings

sentence-transformers/all-MiniLM-L6-v2
sentence-transformers/all-mpnet-base-v2
BAAI/bge-base-en-v1.5
intfloat/multilingual-e5-base

Configuration Example

embedding_model: sentence-transformers/all-MiniLM-L6-v2

🔍 Supported Retrievers

Retriever	Description
FAISS	Fast vector similarity search and indexing.
Chroma	Persistent vector database with embeddings.
bm25	Classical lexical search based on term relevance (TF-IDF-style).
numpy	Brute-force similarity search using raw vectors and matrix ops.

Configuration Example

retriever: faiss

🧮 Dataset Options

Mode	Example	Description
Default	validation_set=None	Uses built-in validation_qa.json.
Custom File	validation_set="data/my_eval.json"	Your QA dataset.
Hugging Face Dataset	validation_set="squad"	Downloads benchmark dataset.
Generate	validation_set="generate"	Generates the QA dataset with LLM.

🧩 Folder Structure

ragmint_mcp_server/
├── app.py  # MCP server entrypoint
├── models.py
└── api.py

🔧 MCP Tools (app.py)

The app.py file provides the Gradio UI and also registers the functions exposed as MCP Tools, enabling external MCP clients (Claude Desktop, Cursor, VS Code MCP extension, etc.) to call Ragmint programmatically.

app.py launches the FastAPI backend (api.py) in a background thread and exposes the following MCP tools:

MCP Tool	Python Function	Description
upload_docs	upload_docs_tool()	Uploads `.txt` files or remote URLs into the configured `docs_path`.
upload_urls	upload_urls_tool()	Downloads remote files from external URLs and stores them inside `docs_path`.
optimize_rag	optimize_rag_tool()	Runs explicit hyperparameter optimization for a RAG pipeline.
autotune	autotune_tool()	Automatically recommends best chunking + embedding configuration.
generate_qa	generate_qa_tool()	Generates synthetic QA validation dataset for evaluation.
clear_cache	clear_cache_tool()	Deletes all docs inside `data/docs` to reset the workspace.

🎬 Demo

YouTube: https://www.youtube.com/watch?v=DKtHBI3jYgQ

📥 Inputs

The Ragmint MCP Server exposes three main endpoints with the following inputs:

1. Upload Documents (`upload_docs`)

Input: .txt files or file-like objects to upload to the documents directory (docs_path).

View Input Model

Field	Type	Description	Example
files	File[]	Local `.txt` files selected or passed from MCP client	["sample.txt"]
docs_path	str	Directory where files are stored	data/docs

2. Upload URLs (`upload_urls`)

Input: List of URLs referencing .txt files to download and store in docs_path.

View Input Model

Field	Type	Description	Example
urls	List[str]	List of URLs pointing to remote documents	["https://example.com/doc.txt"]
docs_path	str	Directory where downloaded files are saved	data/docs

3. Optimize RAG (`optimize_rag`)

Input: JSON object following the OptimizeRequest model.

View Input Model

Field	Type	Description	Example
docs_path	str	Folder containing documents	data/docs
retriever	List[str]	Retriever type	["faiss"]
embedding_model	List[str]	Embedding model name or path	["sentence-transformers/all-MiniLM-L6-v2"]
strategy	List[str]	RAG strategy	["fixed"]
chunk_sizes	List[int]	Chunk sizes to evaluate	[200]
overlaps	List[int]	Overlap values to test	[50]
rerankers	List[str]	Rerankers to apply after retrieval	["mmr"]
search_type	str	Parameter search method (grid, random, bayesian)	"grid"
trials	int	Number of optimization trials	2
metric	str	Evaluation metric for optimization	"faithfulness"
validation_choice	str	Validation data source (generate, local JSON path, HF dataset ID, etc.)	"generate"
llm_model	str	LLM used to generate QA dataset when validation_choice=generate	"gemini-2.5-flash-lite"

4. Autotune RAG (`autotune`)

Input: JSON object following the AutotuneRequest model.

View Input Model

Field	Type	Description	Example
docs_path	str	Folder containing documents	data/docs
embedding_model	str	Embedding model name or path	"sentence-transformers/all-MiniLM-L6-v2"
num_chunk_pairs	int	Number of chunk pairs to analyze for tuning	2
metric	str	Evaluation metric for optimization	"faithfulness"
search_type	str	Search method (grid, random, bayesian)	"grid"
trials	int	Number of optimization trials	2
validation_choice	str	Validation data source (generate, local JSON, HF dataset)	"generate"
llm_model	str	LLM used for generating QA dataset	"gemini-2.5-flash-lite"

5. Generate QA (`generate_qa`)

Input: JSON object following the QARequest model.

View Input Model

Field	Type	Description	Example
docs_path	str	Folder containing documents for QA generation	data/docs
llm_model	str	LLM used for question generation	"gemini-2.5-flash-lite"
batch_size	int	Number of documents processed per batch	5
min_q	int	Minimum number of questions per document	3
max_q	int	Maximum number of questions per document	25

6. Clear Cache (`clear_cache`)

Deletes all stored documents from data/docs.

View Input Model

Field	Type	Description	Example
docs_path	str	Folder to wipe clean	data/docs

📤 Outputs

The Ragmint MCP Server exposes three main endpoints with the following example outputs:

1. Upload Documents Response (`upload_docs`)

View Response Example

{
  "status": "ok",
  "uploaded_files": ["sample.txt"],
  "docs_path": "data/docs"
}

status: "ok" → Indicates that the upload was successful.
uploaded_files: List of file names that were successfully uploaded.
docs_path: The directory where the uploaded documents are stored.

✅ Confirms your documents are ready for RAG operations.

2. Upload URLs Response (`upload_urls`)

View Response Example

{
  "status": "ok",
  "uploaded_files": ["doc.txt"],
  "docs_path": "data/docs"
}

status: "ok" → Indicates that the upload was successful.
uploaded_files: List of file names that were successfully uploaded.
docs_path: The directory where the uploaded documents are stored.

✅ Confirms your documents are ready for RAG operations.

3. Optimize RAG Response (`optimize_rag`)

View Response Example

{
  "status": "finished",
  "run_id": "opt_1763222218",
  "elapsed_seconds": 0.937,
  "best_config": {
    "retriever": "faiss",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "reranker": "mmr",
    "chunk_size": 200,
    "overlap": 50,
    "strategy": "fixed",
    "faithfulness": 0.8659,
    "latency": 0.0333
  },
  "results": [
    {
      "retriever": "faiss",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 200,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 0.8659,
      "latency": 0.0333
    }
  ],
  "corpus_stats": {
    "num_docs": 1,
    "avg_len": 8.0,
    "corpus_size": 61
  }
}

status: "finished" → Optimization process completed.
run_id: Unique identifier for this optimization run.
elapsed_seconds: How long the optimization took.
best_config: Configuration that gave the best performance.
- retriever → The retrieval algorithm used (faiss).
- embedding_model → Embedding model applied.
- reranker → Reranking strategy after retrieval.
- chunk_size → Size of document chunks used in RAG.
- overlap → Overlap between consecutive chunks.
- strategy → RAG retrieval strategy.
- faithfulness → Evaluation score (higher = better).
- latency → Time per query in seconds.
results: List of all tested configurations and their scores.
corpus_stats: Statistics about the uploaded documents.
- num_docs → Number of documents in corpus.
- avg_len → Average document length.
- corpus_size → Total size in characters or tokens.

4. Autotune RAG Response (`autotune`)

View Response Example

{
  "status": "finished",
  "run_id": "autotune_1763222228",
  "elapsed_seconds": 4.733,
  "recommendation": {
    "retriever": "BM25",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "chunk_size": 100,
    "overlap": 30,
    "strategy": "fixed",
    "chunk_candidates": [[100, 30], [110, 30]]
  },
  "chunk_candidates": [[90, 50], [70, 50]],
  "best_config": {
    "retriever": "BM25",
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "reranker": "mmr",
    "chunk_size": 70,
    "overlap": 50,
    "strategy": "fixed",
    "faithfulness": 1.0,
    "latency": 0.0272
  },
  "results": [
    {
      "retriever": "BM25",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 70,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 1.0,
      "latency": 0.0272
    },
    {
      "retriever": "BM25",
      "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
      "reranker": "mmr",
      "chunk_size": 90,
      "overlap": 50,
      "strategy": "fixed",
      "faithfulness": 1.0,
      "latency": 0.0186
    }
  ],
  "corpus_stats": {
    "num_docs": 1,
    "avg_len": 8.0,
    "corpus_size": 61
  }
}

recommendation: The tuned configuration suggested by the autotuner.
chunk_candidates: List of possible chunk_size/overlap pairs analyzed.
best_config: Best-performing configuration with metrics.
results: All tested configurations and their performance.
corpus_stats: Same as in optimize response.
status, run_id, elapsed_seconds: Same meaning as Optimize endpoint.

🧠 Difference from Optimize: Autotune automatically selects the best hyperparameters, rather than testing all user-specified combinations.

5. Generate QA Response (`generate_qa`)

View Response Example

{
  "status": "finished",
  "output_path": "data/docs/validation_qa.json",
  "preview_count": 3,
  "sample": [
    {
      "query": "What capability does Artificial Intelligence provide to machines?",
      "expected_answer": "Artificial Intelligence enables machines to learn from data."
    },
    {
      "query": "What is the primary source of learning for machines with Artificial Intelligence?",
      "expected_answer": "Machines with Artificial Intelligence learn from data."
    },
    {
      "query": "How does Artificial Intelligence facilitate machine learning?",
      "expected_answer": "Artificial Intelligence enables machines to learn from data."
    }
  ]
}

output_path: Where the generated QA JSON file is saved.
preview_count: Number of QA pairs included in the response preview.
sample: Example QA pairs:
- query → The question generated from the document.
- expected_answer → The reference answer corresponding to that question.
status: "finished" → QA generation completed successfully.

6. Clear Cache Response (`clear_cache`)

View Response Example

{
  "status": "ok",
  "deleted_files": 7,
  "docs_path": "data/docs"
}

deleted_files: Number of documents removed.
status: "ok" indicates successful workspace reset.

📘 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

_{Built with ❤️ by André Oliveira | Apache 2.0 License}

Ragmint MCP Server

🧩 Overview

Ragmint

Features exposed via MCP:

🚀 Quick Start

Installation

Running the MCP Server

Environment Variables

🧠 MCP Usage

🔤 Supported Embeddings

Configuration Example

🔍 Supported Retrievers

Configuration Example

🧮 Dataset Options

🧩 Folder Structure

🔧 MCP Tools (app.py)

🎬 Demo

📥 Inputs

1. Upload Documents (upload_docs)

2. Upload URLs (upload_urls)

3. Optimize RAG (optimize_rag)

4. Autotune RAG (autotune)

5. Generate QA (generate_qa)

6. Clear Cache (clear_cache)

📤 Outputs

1. Upload Documents Response (upload_docs)

2. Upload URLs Response (upload_urls)

3. Optimize RAG Response (optimize_rag)

4. Autotune RAG Response (autotune)

5. Generate QA Response (generate_qa)

6. Clear Cache Response (clear_cache)

📘 License

1. Upload Documents (`upload_docs`)

2. Upload URLs (`upload_urls`)

3. Optimize RAG (`optimize_rag`)

4. Autotune RAG (`autotune`)

5. Generate QA (`generate_qa`)

6. Clear Cache (`clear_cache`)

1. Upload Documents Response (`upload_docs`)

2. Upload URLs Response (`upload_urls`)

3. Optimize RAG Response (`optimize_rag`)

4. Autotune RAG Response (`autotune`)

5. Generate QA Response (`generate_qa`)

6. Clear Cache Response (`clear_cache`)