Spaces:

Nullpointer-KK
/

CryptoRAG

Sleeping

App Files Files Community

CryptoRAG / README.md

Nullpointer-KK

Update README.md

0e692b1 verified 4 months ago

preview code

raw

history blame contribute delete

5.78 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: Crypto_RAG_ChatBot
emoji: 💡
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.44.1
app_file: app.py
pinned: false

Crypto_RAG_ChatBot

This is a cryptocurrency-focused Retrieval-Augmented Generation (RAG) app. It retrieves from your uploaded documents and added URLs, reranks results, and generates answers with a chat LLM. It can also route "price" queries to a live price tool for major coins.

OPENAI API KEY (REQUIRED; PASTE-ONLY, NEVER SAVED)

The chat model requires an OpenAI API key.
Paste your key into the "OpenAI API Key" field in the UI.
The key is kept in memory for the current session only:
- It is not written to disk, not bundled in the repository, and not logged.
- Restarting or refreshing the Space clears it.
The key is used only to generate chat responses for this app.

QUICK COST ESTIMATE (GPT-4o-mini PRICING)

Pricing used: (https://platform.openai.com/docs/pricing)

Input: $0.15 per 1,000,000 tokens
Output: $0.60 per 1,000,000 tokens

Assumptions:

200 input tokens per query
500 output tokens per query
20 queries total

Per-query cost:

Input: 200 / 1,000,000 * $0.15 = $0.00003
Output: 500 / 1,000,000 * $0.60 = $0.00030
Total per query = $0.00003 + $0.00030 = $0.00033

20-query session:

$0.00033 * 20 = $0.0066

Result: Approximately six-tenths of a cent ($0.0066) for the full try, well under the $0.50 budget.

FREE OPEN-SOURCE RETRIEVAL MODELS

Embeddings: sentence-transformers/all-MiniLM-L6-v2
Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2 These run locally in the Space and are free (no API costs).

UPLOADING .PDF / .TXT / .MD FILES (FOR RAG)

Click "Add files" and select any combination of .pdf, .txt, or .md. Sample pdf and txt files can be found at : /data/samples -> download the files to your drive and multi select to upload
The app extracts text (PDFs via pypdf), splits into chunks, and stores them with metadata for retrieval.
After adding files, click "Build Index" so they are included in search.

ADDING MULTIPLE URLS (FOR RAG)

Paste one URL per line in the "URLs" box and click "Add URLs". Refer to /data/samples/Links%20sample.txt for example links
The app fetches and parses the pages (static articles and PDFs work best).
After adding URLs, click "Build Index" to include them in retrieval.

BUILD INDEX AND RETRIEVAL OPTIONS

After you add files and/or URLs, click "Build Index". You can tune:

Top-K retrieve (k): how many candidates to pull initially (e.g., 6 to 10).
Hybrid alpha (BM25 <-> Dense): blend between keyword BM25 and dense similarity. 0.0 = all dense, 1.0 = all BM25. A balanced default is 0.5.
Rerank Top-K: how many of the retrieved candidates the cross-encoder reranks (e.g., 3 to 8). The final answer uses the top reranked passages.

If results seem off:

Increase Top-K for better recall.
Adjust alpha (higher favors keywords; lower favors semantic similarity).
Increase Rerank Top-K for stronger final ordering (slightly more CPU).

STREAMING RESPONSES (SELECTABLE)

Streaming ON: words appear live as the model generates.
Streaming OFF: you receive a single final answer after generation finishes. Toggle this with the "Streaming" checkbox.

CHAT MODEL CHOICE (FIXED FOR NOW)

The chat model is fixed in this version of the app for reliability and cost control. If you need a different model, it can be changed in code and redeployed; the UI currently does not expose a model selector.

LIVE PRICE SEARCH FOR MAJOR COINS + ROUTING

If your question looks like a price query (for example: "BTC price", "price of ETH", "SOL price in USD"), the app routes to a tool instead of RAG:
- It calls a public price API (for example, CoinGecko) to get the latest price for major coins such as BTC, ETH, SOL, and XRP.
- It can also show the Fear and Greed Index for market sentiment.
Routing logic: the pipeline checks your query for price-intent keywords ("price", "quote", "market cap", "ATH", and similar). If matched, it uses the tools route; otherwise it uses the RAG route (retrieve -> rerank -> answer).

QUICK START

Initialize pipeline (if manual init is enabled).
Paste your OpenAI API key (not saved).
Add files and/or add URLs. - Sample pdf and txt files can be found at : /data/samples -> download the files to your drive and multi select to upload - Refer to /data/samples/Links%20sample.txt for example links
Click Build Index.
Ask questions. E.g. "what is Ethereum vs Solana?", "What is bitcoin strength and weakness?"
For prices, try queries like "ETH price", "SOL quote", "XRP price in USD".

NOTES

This tool is for research and education only. It is not financial advice.
For best results, use focused, well-structured documents and reputable URLs.

Installation and Execution (for Gradio UI)

Create a new Python environment:
```
python -m venv .venv
```
Activate the environment:

For macOS and Linux:
```
source .venv/bin/activate
```
For Windows:
```
.venv\Scripts\activate
```
Install the dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```