Smart Retrieval Backend

The backend is built using Python & FastAPI bootstrapped with create-llama.

Requirements

Python >= 3.11
Miniconda (To manage Python versions)
- Windows
- Linux
- MacOS
- conda create -n SmartRetrieval python=3.11
Pipx (To manage Python packages)
- pip install pipx (If you already have pipx installed, you can skip this step)
Cuda > 12.1 (if you have a Nvidia GPU)
- Windows
- Linux
- MacOS
Poetry (To manage dependencies)
- pipx install poetry

Getting Started

First, ensure if you want to use the cuda version of pytorch, you have the correct version cuda > 12.1 of cuda installed. You can check this by running nvcc --version or nvidia-smi in your terminal. If you do not have cuda installed, you can install it from here.

Ensure you have followed the steps in the requirements section above.

Then activate the conda environment:

conda activate SmartRetrieval

Second, setup the environment:

# Only run one of the following commands:
-----------------------------------------------
# Install dependencies and torch (cpu version)
# Windows: Set env for llama-cpp-python with openblas support on cpu
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
# Linux: Set env for llama-cpp-python with openblas support on cpu
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
# Then:
poetry install --without torch-cuda
-----------------------------------------------
# Install dependencies and torch (cuda version)
# Windows: Set env for llama-cpp-python with cuda support on gpu
$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
# Linux: Set env for llama-cpp-python with cuda support on gpu
CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# Then:
poetry install --without torch-cpu

# Enter poetry shell
poetry shell

Third, run the development server:

python run.py

Then call the API endpoint /api/chat to see the result:

curl --location 'localhost:8000/api/chat' \
--header 'Content-Type: application/json' \
--data '{ "messages": [{ "role": "user", "content": "Hello" }] }'

You can start editing the API by modifying app/api/routers/chat.py. The endpoint auto-updates as you save the file.

Open http://localhost:8000/docs with your browser to see the Swagger UI of the API.

The API allows CORS for all origins to simplify development. You can change this behavior by setting the ENVIRONMENT environment variable to prod:

ENVIRONMENT=prod uvicorn main:app

Learn More

To learn more about LlamaIndex, take a look at the following resources:

LlamaIndex Documentation - learn about LlamaIndex.
LlamaIndexTS Documentation - learn about LlamaIndexTS (Typescript features).
FastAPI Documentation - learn about FastAPI.