Commit ·
e84d389
0
Parent(s):
first commit
Browse files- README.md +108 -0
- app/__init__.py +0 -0
- app/__pycache__/__init__.cpython-312.pyc +0 -0
- app/__pycache__/config.cpython-312.pyc +0 -0
- app/__pycache__/dependencies.cpython-312.pyc +0 -0
- app/__pycache__/main.cpython-312.pyc +0 -0
- app/config.py +26 -0
- app/db/__init__.py +0 -0
- app/db/__pycache__/__init__.cpython-312.pyc +0 -0
- app/db/__pycache__/chat_manager.cpython-312.pyc +0 -0
- app/db/__pycache__/mongodb.cpython-312.pyc +0 -0
- app/db/chat_manager.py +64 -0
- app/db/mongodb.py +19 -0
- app/dependencies.py +27 -0
- app/main.py +46 -0
- app/models/__init__.py +0 -0
- app/models/__pycache__/__init__.cpython-312.pyc +0 -0
- app/models/__pycache__/transcription.cpython-312.pyc +0 -0
- app/models/__pycache__/user.cpython-312.pyc +0 -0
- app/models/transcription.py +25 -0
- app/models/user.py +20 -0
- app/routes/__init__.py +0 -0
- app/routes/__pycache__/__init__.cpython-312.pyc +0 -0
- app/routes/__pycache__/auth.cpython-312.pyc +0 -0
- app/routes/__pycache__/query.cpython-312.pyc +0 -0
- app/routes/__pycache__/sessions.cpython-312.pyc +0 -0
- app/routes/__pycache__/video.cpython-312.pyc +0 -0
- app/routes/auth.py +27 -0
- app/routes/query.py +61 -0
- app/routes/sessions.py +90 -0
- app/routes/video.py +131 -0
- app/services/__init__.py +0 -0
- app/services/__pycache__/__init__.cpython-312.pyc +0 -0
- app/services/__pycache__/auth.cpython-312.pyc +0 -0
- app/services/__pycache__/llm.cpython-312.pyc +0 -0
- app/services/__pycache__/transcription.cpython-312.pyc +0 -0
- app/services/auth.py +33 -0
- app/services/llm.py +68 -0
- app/services/transcription.py +73 -0
- app/utils/__init__.py +0 -0
- app/utils/helpers.py +6 -0
- asgi.py +2 -0
- requirements.txt +124 -0
- vercel.json +9 -0
README.md
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Video RAG System Project
|
| 2 |
+
|
| 3 |
+
This FastAPI-based Video RAG (Retrieval-Augmented Generation) system provides endpoints to:
|
| 4 |
+
|
| 5 |
+
1. **Register & Authenticate** users
|
| 6 |
+
2. **Transcribe** YouTube or uploaded videos
|
| 7 |
+
3. **Query** the RAG system
|
| 8 |
+
4. **Manage** sessions (list, view, delete)
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## Endpoint Flow
|
| 13 |
+
|
| 14 |
+
```mermaid
|
| 15 |
+
graph TD
|
| 16 |
+
A[POST /register] --> B[POST /token]
|
| 17 |
+
B --> C[POST /transcribe]
|
| 18 |
+
B --> D[POST /upload]
|
| 19 |
+
C --> E[Start RAG session]
|
| 20 |
+
D --> E
|
| 21 |
+
E --> F[POST /query]
|
| 22 |
+
E --> G[GET /sessions]
|
| 23 |
+
G --> H[GET /sessions/{session_id}]
|
| 24 |
+
H --> F
|
| 25 |
+
G --> I[DELETE /sessions/{session_id}]
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
1. **User Registration & Login**
|
| 29 |
+
- **POST /register**: Create a new user.
|
| 30 |
+
- **POST /token**: Obtain JWT access token.
|
| 31 |
+
|
| 32 |
+
2. **Video Transcription**
|
| 33 |
+
- **POST /transcribe** (YouTube URL): Transcribe via Google GenAI → split & store chunks → initialize chat history → return `session_id`.
|
| 34 |
+
- **POST /upload** (Multipart Form Video): Upload & transcribe file → split & store chunks → initialize chat history → return `session_id`.
|
| 35 |
+
|
| 36 |
+
3. **Query RAG System**
|
| 37 |
+
- **POST /query** with `{ session_id, query }`:
|
| 38 |
+
• Rebuild FAISS retriever from MongoDB chunks
|
| 39 |
+
• Invoke ConversationalRetrievalChain
|
| 40 |
+
• Append messages to chat history
|
| 41 |
+
• Return `{ answer, session_id, source_documents }`
|
| 42 |
+
|
| 43 |
+
4. **Session Management**
|
| 44 |
+
- **GET /sessions**: List all sessions for current user.
|
| 45 |
+
- **GET /sessions/{session_id}**: Get full transcription & Q&A history.
|
| 46 |
+
- **DELETE /sessions/{session_id}**: Remove metadata, chunks, chat history, and video files.
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## README.md
|
| 51 |
+
|
| 52 |
+
```markdown
|
| 53 |
+
# Video RAG System
|
| 54 |
+
|
| 55 |
+
## Overview
|
| 56 |
+
A FastAPI application that:
|
| 57 |
+
|
| 58 |
+
- Authenticates users (JWT)
|
| 59 |
+
- Transcribes videos (YouTube or upload) via Google GenAI
|
| 60 |
+
- Stores transcription chunks in MongoDB
|
| 61 |
+
- Builds a FAISS retriever on demand
|
| 62 |
+
- Provides a conversational retrieval endpoint
|
| 63 |
+
- Manages sessions and associated data
|
| 64 |
+
|
| 65 |
+
## API Endpoints
|
| 66 |
+
|
| 67 |
+
| Method | Path | Auth Required | Description |
|
| 68 |
+
|--------|----------------------------|---------------|-----------------------------------------------|
|
| 69 |
+
| POST | /register | No | Create a new user |
|
| 70 |
+
| POST | /token | No | Login and return JWT token |
|
| 71 |
+
| POST | /transcribe | Yes | Transcribe YouTube video and init session |
|
| 72 |
+
| POST | /upload | Yes | Upload & transcribe video file |
|
| 73 |
+
| POST | /query | Yes | Run Q&A against a session |
|
| 74 |
+
| GET | /sessions | Yes | List all user sessions |
|
| 75 |
+
| GET | /sessions/{session_id} | Yes | Get session transcription & chat history |
|
| 76 |
+
| DELETE | /sessions/{session_id} | Yes | Delete session & all associated data |
|
| 77 |
+
|
| 78 |
+
## Usage
|
| 79 |
+
1. Clone repo & install dependencies:
|
| 80 |
+
```bash
|
| 81 |
+
pip install -r requirements.txt
|
| 82 |
+
```
|
| 83 |
+
2. Create `.env` with your credentials (MongoDB, JWT secret, API keys).
|
| 84 |
+
3. Run the app:
|
| 85 |
+
```bash
|
| 86 |
+
uvicorn app.main:app --reload
|
| 87 |
+
```
|
| 88 |
+
4. Interact via HTTP clients (curl, Postman) following the flow above.
|
| 89 |
+
|
| 90 |
+
## Folder Structure
|
| 91 |
+
```
|
| 92 |
+
rag_system/
|
| 93 |
+
├── app/
|
| 94 |
+
│ ├── main.py
|
| 95 |
+
│ ├── config.py
|
| 96 |
+
│ ├── dependencies.py
|
| 97 |
+
│ ├── models/
|
| 98 |
+
│ ├── db/
|
| 99 |
+
│ ├── services/
|
| 100 |
+
│ ├── routes/
|
| 101 |
+
│ └── utils/
|
| 102 |
+
├── temp_videos/
|
| 103 |
+
├── .env
|
| 104 |
+
├── requirements.txt
|
| 105 |
+
└── README.md
|
| 106 |
+
```
|
| 107 |
+
```
|
| 108 |
+
```
|
app/__init__.py
ADDED
|
File without changes
|
app/__pycache__/__init__.cpython-312.pyc
ADDED
|
Binary file (160 Bytes). View file
|
|
|
app/__pycache__/config.cpython-312.pyc
ADDED
|
Binary file (1.18 kB). View file
|
|
|
app/__pycache__/dependencies.cpython-312.pyc
ADDED
|
Binary file (1.44 kB). View file
|
|
|
app/__pycache__/main.cpython-312.pyc
ADDED
|
Binary file (2.06 kB). View file
|
|
|
app/config.py
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
from dotenv import load_dotenv
|
| 3 |
+
from urllib.parse import quote_plus
|
| 4 |
+
|
| 5 |
+
load_dotenv()
|
| 6 |
+
|
| 7 |
+
class Settings:
|
| 8 |
+
# MongoDB
|
| 9 |
+
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
|
| 10 |
+
MONGO_PASSWORD = quote_plus(os.getenv("MONGO_PASSWORD")) # Escape special characters
|
| 11 |
+
DATABASE_NAME = os.getenv("DATABASE_NAME")
|
| 12 |
+
COLLECTION_NAME = os.getenv("COLLECTION_NAME")
|
| 13 |
+
CONNECTION_STRING = os.getenv("CONNECTION_STRING_TEMPLATE").format(
|
| 14 |
+
username=MONGO_USERNAME,
|
| 15 |
+
password=MONGO_PASSWORD
|
| 16 |
+
)
|
| 17 |
+
|
| 18 |
+
# Security
|
| 19 |
+
SECRET_KEY = os.getenv("SECRET_KEY")
|
| 20 |
+
ALGORITHM = "HS256"
|
| 21 |
+
ACCESS_TOKEN_EXPIRE_MINUTES = 30
|
| 22 |
+
|
| 23 |
+
# Video storage
|
| 24 |
+
VIDEOS_DIR = "temp_videos"
|
| 25 |
+
|
| 26 |
+
settings = Settings()
|
app/db/__init__.py
ADDED
|
File without changes
|
app/db/__pycache__/__init__.cpython-312.pyc
ADDED
|
Binary file (163 Bytes). View file
|
|
|
app/db/__pycache__/chat_manager.cpython-312.pyc
ADDED
|
Binary file (2.57 kB). View file
|
|
|
app/db/__pycache__/mongodb.cpython-312.pyc
ADDED
|
Binary file (1.53 kB). View file
|
|
|
app/db/chat_manager.py
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app/db/chat_manager.py
|
| 2 |
+
import uuid
|
| 3 |
+
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
|
| 4 |
+
from ..config import settings
|
| 5 |
+
|
| 6 |
+
class ChatManagement:
|
| 7 |
+
def __init__(self, connection_string, database_name, collection_name):
|
| 8 |
+
self.connection_string = connection_string
|
| 9 |
+
self.database_name = database_name
|
| 10 |
+
self.collection_name = collection_name
|
| 11 |
+
# map session_id to MongoDBChatMessageHistory instances
|
| 12 |
+
self.chat_sessions = {}
|
| 13 |
+
|
| 14 |
+
def _create_history(self, session_id: str) -> MongoDBChatMessageHistory:
|
| 15 |
+
"""
|
| 16 |
+
Internal: create a new MongoDBChatMessageHistory for a session_id.
|
| 17 |
+
"""
|
| 18 |
+
history = MongoDBChatMessageHistory(
|
| 19 |
+
session_id=session_id,
|
| 20 |
+
connection_string=self.connection_string,
|
| 21 |
+
database_name=self.database_name,
|
| 22 |
+
collection_name=self.collection_name
|
| 23 |
+
)
|
| 24 |
+
# store in memory
|
| 25 |
+
self.chat_sessions[session_id] = history
|
| 26 |
+
return history
|
| 27 |
+
|
| 28 |
+
def get_chat_history(self, session_id: str) -> MongoDBChatMessageHistory | None:
|
| 29 |
+
"""
|
| 30 |
+
Retrieve an existing chat history object from memory or database.
|
| 31 |
+
Returns None if no history found.
|
| 32 |
+
"""
|
| 33 |
+
# in-memory
|
| 34 |
+
if session_id in self.chat_sessions:
|
| 35 |
+
return self.chat_sessions[session_id]
|
| 36 |
+
# instantiate from DB
|
| 37 |
+
history = MongoDBChatMessageHistory(
|
| 38 |
+
session_id=session_id,
|
| 39 |
+
connection_string=self.connection_string,
|
| 40 |
+
database_name=self.database_name,
|
| 41 |
+
collection_name=self.collection_name
|
| 42 |
+
)
|
| 43 |
+
if history.messages:
|
| 44 |
+
self.chat_sessions[session_id] = history
|
| 45 |
+
return history
|
| 46 |
+
return None
|
| 47 |
+
|
| 48 |
+
def initialize_chat_history(self, session_id: str) -> MongoDBChatMessageHistory:
|
| 49 |
+
"""
|
| 50 |
+
Ensure a chat history exists for the session_id. Return the history instance.
|
| 51 |
+
"""
|
| 52 |
+
history = self.get_chat_history(session_id)
|
| 53 |
+
if history:
|
| 54 |
+
return history
|
| 55 |
+
# no existing history, create new object (and DB entries)
|
| 56 |
+
return self._create_history(session_id)
|
| 57 |
+
|
| 58 |
+
# create a global instance for use in routes
|
| 59 |
+
from ..config import settings
|
| 60 |
+
chat_manager = ChatManagement(
|
| 61 |
+
settings.CONNECTION_STRING,
|
| 62 |
+
settings.DATABASE_NAME,
|
| 63 |
+
settings.COLLECTION_NAME
|
| 64 |
+
)
|
app/db/mongodb.py
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pymongo import MongoClient
|
| 2 |
+
from ..config import settings
|
| 3 |
+
|
| 4 |
+
class MongoDB:
|
| 5 |
+
def __init__(self):
|
| 6 |
+
self.client = MongoClient(settings.CONNECTION_STRING)
|
| 7 |
+
self.db = self.client[settings.DATABASE_NAME]
|
| 8 |
+
self.users = self.db["users"]
|
| 9 |
+
self.videos = self.db[settings.COLLECTION_NAME]
|
| 10 |
+
# Indexes
|
| 11 |
+
self.users.create_index("username", unique=True)
|
| 12 |
+
self.users.create_index("email", unique=True)
|
| 13 |
+
self.videos.create_index("video_id", unique=True)
|
| 14 |
+
self.videos.create_index("user_id")
|
| 15 |
+
|
| 16 |
+
def close(self):
|
| 17 |
+
self.client.close()
|
| 18 |
+
|
| 19 |
+
mongodb = MongoDB()
|
app/dependencies.py
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi import Depends, HTTPException
|
| 2 |
+
from fastapi.security import OAuth2PasswordBearer
|
| 3 |
+
import jwt
|
| 4 |
+
from .config import settings
|
| 5 |
+
from .services.auth import get_user
|
| 6 |
+
from .models.user import TokenData
|
| 7 |
+
|
| 8 |
+
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/token")
|
| 9 |
+
|
| 10 |
+
async def get_current_user(token: str = Depends(oauth2_scheme)):
|
| 11 |
+
credentials_exception = HTTPException(
|
| 12 |
+
status_code=401,
|
| 13 |
+
detail="Could not validate credentials",
|
| 14 |
+
headers={"WWW-Authenticate": "Bearer"},
|
| 15 |
+
)
|
| 16 |
+
try:
|
| 17 |
+
payload = jwt.decode(token, settings.SECRET_KEY, algorithms=[settings.ALGORITHM])
|
| 18 |
+
username: str = payload.get("sub")
|
| 19 |
+
if username is None:
|
| 20 |
+
raise credentials_exception
|
| 21 |
+
token_data = TokenData(username=username)
|
| 22 |
+
except jwt.PyJWTError:
|
| 23 |
+
raise credentials_exception
|
| 24 |
+
user = get_user(token_data.username)
|
| 25 |
+
if user is None:
|
| 26 |
+
raise credentials_exception
|
| 27 |
+
return user
|
app/main.py
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import shutil
|
| 3 |
+
from fastapi import FastAPI
|
| 4 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 5 |
+
from dotenv import load_dotenv
|
| 6 |
+
from .config import settings
|
| 7 |
+
from .db.mongodb import mongodb
|
| 8 |
+
from .routes import auth, video, query, sessions
|
| 9 |
+
|
| 10 |
+
load_dotenv()
|
| 11 |
+
|
| 12 |
+
app = FastAPI(
|
| 13 |
+
title="RAG System API",
|
| 14 |
+
description="An API for question answering based on video content with user authentication"
|
| 15 |
+
)
|
| 16 |
+
|
| 17 |
+
# CORS
|
| 18 |
+
app.add_middleware(
|
| 19 |
+
CORSMiddleware,
|
| 20 |
+
allow_origins=["*"],
|
| 21 |
+
allow_credentials=True,
|
| 22 |
+
allow_methods=["*"],
|
| 23 |
+
allow_headers=["*"],
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
# Include routers
|
| 27 |
+
app.include_router(auth.router)
|
| 28 |
+
app.include_router(video.router)
|
| 29 |
+
app.include_router(query.router)
|
| 30 |
+
app.include_router(sessions.router)
|
| 31 |
+
|
| 32 |
+
@app.get("/")
|
| 33 |
+
async def root():
|
| 34 |
+
return {"message": "Video Transcription and QA API"}
|
| 35 |
+
|
| 36 |
+
@app.on_event("shutdown")
|
| 37 |
+
def on_shutdown():
|
| 38 |
+
# Close DB
|
| 39 |
+
mongodb.close()
|
| 40 |
+
# Clean up temp videos
|
| 41 |
+
shutil.rmtree(settings.VIDEOS_DIR, ignore_errors=True)
|
| 42 |
+
|
| 43 |
+
if __name__ == "__main__":
|
| 44 |
+
import uvicorn
|
| 45 |
+
os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
| 46 |
+
uvicorn.run(app, host="0.0.0.0", port=8000)
|
app/models/__init__.py
ADDED
|
File without changes
|
app/models/__pycache__/__init__.cpython-312.pyc
ADDED
|
Binary file (167 Bytes). View file
|
|
|
app/models/__pycache__/transcription.cpython-312.pyc
ADDED
|
Binary file (1.56 kB). View file
|
|
|
app/models/__pycache__/user.cpython-312.pyc
ADDED
|
Binary file (1.32 kB). View file
|
|
|
app/models/transcription.py
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pydantic import BaseModel, Field
|
| 2 |
+
from typing import List, Optional, Dict, Any
|
| 3 |
+
from datetime import datetime
|
| 4 |
+
|
| 5 |
+
class TranscriptionRequest(BaseModel):
|
| 6 |
+
youtube_url: str
|
| 7 |
+
|
| 8 |
+
class QueryRequest(BaseModel):
|
| 9 |
+
query: str
|
| 10 |
+
session_id: str
|
| 11 |
+
|
| 12 |
+
class QueryResponse(BaseModel):
|
| 13 |
+
answer: str
|
| 14 |
+
session_id: str
|
| 15 |
+
source_documents: Optional[List[str]]
|
| 16 |
+
|
| 17 |
+
class VideoData(BaseModel):
|
| 18 |
+
video_id: str
|
| 19 |
+
user_id: str
|
| 20 |
+
title: str
|
| 21 |
+
source_type: str
|
| 22 |
+
source_url: Optional[str]
|
| 23 |
+
created_at: datetime = Field(default_factory=datetime.utcnow)
|
| 24 |
+
transcription: str
|
| 25 |
+
size: Optional[int]
|
app/models/user.py
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pydantic import BaseModel, EmailStr
|
| 2 |
+
from typing import Optional
|
| 3 |
+
|
| 4 |
+
class User(BaseModel):
|
| 5 |
+
username: str
|
| 6 |
+
email: EmailStr
|
| 7 |
+
full_name: Optional[str]
|
| 8 |
+
|
| 9 |
+
class UserInDB(User):
|
| 10 |
+
hashed_password: str
|
| 11 |
+
|
| 12 |
+
class UserCreate(User):
|
| 13 |
+
password: str
|
| 14 |
+
|
| 15 |
+
class Token(BaseModel):
|
| 16 |
+
access_token: str
|
| 17 |
+
token_type: str
|
| 18 |
+
|
| 19 |
+
class TokenData(BaseModel):
|
| 20 |
+
username: Optional[str]
|
app/routes/__init__.py
ADDED
|
File without changes
|
app/routes/__pycache__/__init__.cpython-312.pyc
ADDED
|
Binary file (167 Bytes). View file
|
|
|
app/routes/__pycache__/auth.cpython-312.pyc
ADDED
|
Binary file (2.22 kB). View file
|
|
|
app/routes/__pycache__/query.cpython-312.pyc
ADDED
|
Binary file (3.04 kB). View file
|
|
|
app/routes/__pycache__/sessions.cpython-312.pyc
ADDED
|
Binary file (5.05 kB). View file
|
|
|
app/routes/__pycache__/video.cpython-312.pyc
ADDED
|
Binary file (7.32 kB). View file
|
|
|
app/routes/auth.py
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi import APIRouter, HTTPException, Depends
|
| 2 |
+
from fastapi.security import OAuth2PasswordRequestForm
|
| 3 |
+
from ..models.user import UserCreate, User, Token
|
| 4 |
+
from ..services.auth import get_password_hash, authenticate_user, create_access_token
|
| 5 |
+
from ..db.mongodb import mongodb
|
| 6 |
+
|
| 7 |
+
router = APIRouter()
|
| 8 |
+
|
| 9 |
+
@router.post("/register", response_model=User)
|
| 10 |
+
async def register(user: UserCreate):
|
| 11 |
+
if mongodb.users.find_one({"username": user.username}):
|
| 12 |
+
raise HTTPException(400, "Username already registered")
|
| 13 |
+
if mongodb.users.find_one({"email": user.email}):
|
| 14 |
+
raise HTTPException(400, "Email already registered")
|
| 15 |
+
hashed = get_password_hash(user.password)
|
| 16 |
+
user_dict = user.dict(exclude={"password"})
|
| 17 |
+
user_dict["hashed_password"] = hashed
|
| 18 |
+
mongodb.users.insert_one(user_dict)
|
| 19 |
+
return User(**user_dict)
|
| 20 |
+
|
| 21 |
+
@router.post("/token", response_model=Token)
|
| 22 |
+
async def login(form_data: OAuth2PasswordRequestForm = Depends()):
|
| 23 |
+
user = authenticate_user(form_data.username, form_data.password)
|
| 24 |
+
if not user:
|
| 25 |
+
raise HTTPException(401, "Incorrect username or password", headers={"WWW-Authenticate": "Bearer"})
|
| 26 |
+
token = create_access_token({"sub": user.username})
|
| 27 |
+
return {"access_token": token, "token_type": "bearer"}
|
app/routes/query.py
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app/routes/query.py
|
| 2 |
+
from fastapi import APIRouter, Depends, HTTPException
|
| 3 |
+
from ..models.transcription import QueryRequest, QueryResponse
|
| 4 |
+
from ..dependencies import get_current_user
|
| 5 |
+
from ..services.transcription import get_retriever
|
| 6 |
+
from ..db.mongodb import mongodb
|
| 7 |
+
from ..db.chat_manager import chat_manager
|
| 8 |
+
from ..services.llm import create_chain
|
| 9 |
+
|
| 10 |
+
router = APIRouter()
|
| 11 |
+
|
| 12 |
+
@router.post("/query", response_model=QueryResponse)
|
| 13 |
+
async def query_system(request: QueryRequest, current_user = Depends(get_current_user)):
|
| 14 |
+
"""
|
| 15 |
+
Query the RAG system for a given session and question
|
| 16 |
+
"""
|
| 17 |
+
# Verify metadata exists
|
| 18 |
+
video = mongodb.videos.find_one({"video_id": request.session_id})
|
| 19 |
+
if not video:
|
| 20 |
+
raise HTTPException(status_code=404, detail="Session not found. Please transcribe a video first.")
|
| 21 |
+
if video.get("user_id") != current_user.username:
|
| 22 |
+
raise HTTPException(status_code=403, detail="Not authorized to access this session.")
|
| 23 |
+
|
| 24 |
+
# Build retriever from MongoDB chunks
|
| 25 |
+
retriever = get_retriever(request.session_id)
|
| 26 |
+
chat_history = chat_manager.initialize_chat_history(request.session_id)
|
| 27 |
+
chain = create_chain(retriever)
|
| 28 |
+
|
| 29 |
+
# Format previous messages for chain
|
| 30 |
+
history = chat_history.messages or []
|
| 31 |
+
formatted_history = []
|
| 32 |
+
for i in range(0, len(history) - 1, 2):
|
| 33 |
+
formatted_history.append((history[i].content, history[i+1].content))
|
| 34 |
+
|
| 35 |
+
# Invoke chain
|
| 36 |
+
result = chain.invoke({
|
| 37 |
+
"question": request.query,
|
| 38 |
+
"chat_history": formatted_history
|
| 39 |
+
})
|
| 40 |
+
|
| 41 |
+
# Extract answer
|
| 42 |
+
answer = result.get("answer", "I couldn't find an answer to your question.")
|
| 43 |
+
# Save new messages
|
| 44 |
+
chat_history.add_user_message(request.query)
|
| 45 |
+
chat_history.add_ai_message(answer)
|
| 46 |
+
|
| 47 |
+
# Process source docs
|
| 48 |
+
source_docs = []
|
| 49 |
+
for doc in result.get("source_documents", []):
|
| 50 |
+
try:
|
| 51 |
+
text = getattr(doc, 'page_content', None) or str(doc)
|
| 52 |
+
snippet = text[:100] + "..." if len(text) > 100 else text
|
| 53 |
+
source_docs.append(snippet)
|
| 54 |
+
except:
|
| 55 |
+
continue
|
| 56 |
+
|
| 57 |
+
return QueryResponse(
|
| 58 |
+
answer=answer,
|
| 59 |
+
session_id=request.session_id,
|
| 60 |
+
source_documents=source_docs
|
| 61 |
+
)
|
app/routes/sessions.py
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app/routes/sessions.py
|
| 2 |
+
from fastapi import APIRouter, Depends, HTTPException
|
| 3 |
+
from typing import List, Dict, Any
|
| 4 |
+
import os
|
| 5 |
+
|
| 6 |
+
from ..dependencies import get_current_user
|
| 7 |
+
from ..db.mongodb import mongodb
|
| 8 |
+
from ..db.chat_manager import chat_manager
|
| 9 |
+
from ..config import settings
|
| 10 |
+
|
| 11 |
+
router = APIRouter()
|
| 12 |
+
|
| 13 |
+
@router.get("/sessions", response_model=List[Dict[str, Any]])
|
| 14 |
+
async def list_sessions(current_user = Depends(get_current_user)):
|
| 15 |
+
"""
|
| 16 |
+
List all video sessions for the current user.
|
| 17 |
+
"""
|
| 18 |
+
videos = list(mongodb.videos.find({"user_id": current_user.username}))
|
| 19 |
+
sessions_list = []
|
| 20 |
+
for v in videos:
|
| 21 |
+
sessions_list.append({
|
| 22 |
+
"session_id": v["video_id"],
|
| 23 |
+
"title": v["title"],
|
| 24 |
+
"source_type": v["source_type"],
|
| 25 |
+
"created_at": v["created_at"],
|
| 26 |
+
"transcription_preview": (v["transcription"][:200] + "...") if len(v["transcription"]) > 200 else v["transcription"]
|
| 27 |
+
})
|
| 28 |
+
return sessions_list
|
| 29 |
+
|
| 30 |
+
@router.get("/sessions/{session_id}", response_model=Dict[str, Any])
|
| 31 |
+
async def get_session(session_id: str, current_user = Depends(get_current_user)):
|
| 32 |
+
"""
|
| 33 |
+
Retrieve details and chat history for a specific session.
|
| 34 |
+
"""
|
| 35 |
+
video = mongodb.videos.find_one({"video_id": session_id})
|
| 36 |
+
if not video:
|
| 37 |
+
raise HTTPException(status_code=404, detail="Session not found")
|
| 38 |
+
if video.get("user_id") != current_user.username:
|
| 39 |
+
raise HTTPException(status_code=403, detail="Not authorized to access this session")
|
| 40 |
+
|
| 41 |
+
# Fetch chat history
|
| 42 |
+
history = chat_manager.get_chat_history(session_id)
|
| 43 |
+
chat_messages = []
|
| 44 |
+
if history:
|
| 45 |
+
msgs = history.messages
|
| 46 |
+
for i in range(0, len(msgs) - 1, 2):
|
| 47 |
+
chat_messages.append({
|
| 48 |
+
"question": msgs[i].content,
|
| 49 |
+
"answer": msgs[i+1].content
|
| 50 |
+
})
|
| 51 |
+
|
| 52 |
+
return {
|
| 53 |
+
"session_id": session_id,
|
| 54 |
+
"title": video["title"],
|
| 55 |
+
"source_type": video["source_type"],
|
| 56 |
+
"source_url": video.get("source_url"),
|
| 57 |
+
"created_at": video["created_at"],
|
| 58 |
+
"transcription_preview": (video["transcription"][:200] + "...") if len(video["transcription"]) > 200 else video["transcription"],
|
| 59 |
+
"full_transcription": video["transcription"],
|
| 60 |
+
"chat_history": chat_messages
|
| 61 |
+
}
|
| 62 |
+
|
| 63 |
+
@router.delete("/sessions/{session_id}")
|
| 64 |
+
async def delete_session(session_id: str, current_user = Depends(get_current_user)):
|
| 65 |
+
"""
|
| 66 |
+
Delete a session, its chunks, chat history, and associated video file.
|
| 67 |
+
"""
|
| 68 |
+
video = mongodb.videos.find_one({"video_id": session_id})
|
| 69 |
+
if not video:
|
| 70 |
+
raise HTTPException(status_code=404, detail="Session not found")
|
| 71 |
+
if video.get("user_id") != current_user.username:
|
| 72 |
+
raise HTTPException(status_code=403, detail="Not authorized to delete this session")
|
| 73 |
+
|
| 74 |
+
# Delete video metadata
|
| 75 |
+
mongodb.videos.delete_one({"video_id": session_id})
|
| 76 |
+
# Delete chunks
|
| 77 |
+
mongodb.db.get_collection("chunks").delete_many({"session_id": session_id})
|
| 78 |
+
# Delete chat history
|
| 79 |
+
history = chat_manager.get_chat_history(session_id)
|
| 80 |
+
if history:
|
| 81 |
+
mongodb.db.get_collection(settings.COLLECTION_NAME).delete_many({"session_id": session_id})
|
| 82 |
+
# Delete video file(s)
|
| 83 |
+
video_files = [f for f in os.listdir(settings.VIDEOS_DIR) if f.startswith(session_id)]
|
| 84 |
+
for file in video_files:
|
| 85 |
+
try:
|
| 86 |
+
os.remove(os.path.join(settings.VIDEOS_DIR, file))
|
| 87 |
+
except OSError:
|
| 88 |
+
pass
|
| 89 |
+
|
| 90 |
+
return {"message": f"Session {session_id} deleted successfully"}
|
app/routes/video.py
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi import APIRouter, Depends, Form, File, UploadFile, BackgroundTasks, HTTPException
|
| 2 |
+
from fastapi.responses import StreamingResponse
|
| 3 |
+
from datetime import datetime
|
| 4 |
+
from typing import Optional, List
|
| 5 |
+
import os
|
| 6 |
+
|
| 7 |
+
from ..models.transcription import TranscriptionRequest
|
| 8 |
+
from ..dependencies import get_current_user
|
| 9 |
+
from ..services.transcription import process_transcription, save_video_file
|
| 10 |
+
from ..services.llm import init_google_client
|
| 11 |
+
from ..config import settings
|
| 12 |
+
from ..db.mongodb import mongodb
|
| 13 |
+
from google.genai import types
|
| 14 |
+
|
| 15 |
+
router = APIRouter()
|
| 16 |
+
|
| 17 |
+
@router.post("/transcribe")
|
| 18 |
+
async def transcribe(
|
| 19 |
+
request: TranscriptionRequest,
|
| 20 |
+
current_user = Depends(get_current_user)
|
| 21 |
+
):
|
| 22 |
+
"""
|
| 23 |
+
Transcribe a YouTube video via Google GenAI and prepare the RAG system
|
| 24 |
+
"""
|
| 25 |
+
try:
|
| 26 |
+
client = init_google_client()
|
| 27 |
+
content = types.Content(
|
| 28 |
+
parts=[
|
| 29 |
+
types.Part(text="Transcribe the Video. Write all the things described in the video"),
|
| 30 |
+
types.Part(file_data=types.FileData(file_uri=request.youtube_url))
|
| 31 |
+
]
|
| 32 |
+
)
|
| 33 |
+
response = client.models.generate_content(
|
| 34 |
+
model='models/gemini-2.0-flash',
|
| 35 |
+
contents=content
|
| 36 |
+
)
|
| 37 |
+
transcription = response.candidates[0].content.parts[0].text
|
| 38 |
+
title = f"YouTube Video - {datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')}"
|
| 39 |
+
session_id = process_transcription(
|
| 40 |
+
transcription,
|
| 41 |
+
current_user.username,
|
| 42 |
+
title,
|
| 43 |
+
source_type="youtube",
|
| 44 |
+
source_url=request.youtube_url
|
| 45 |
+
)
|
| 46 |
+
return {"session_id": session_id, "message": "YouTube video transcribed and RAG system prepared"}
|
| 47 |
+
|
| 48 |
+
except Exception as e:
|
| 49 |
+
raise HTTPException(status_code=500, detail=f"Error transcribing video: {str(e)}")
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
@router.post("/upload")
|
| 53 |
+
async def upload_video(
|
| 54 |
+
background_tasks: BackgroundTasks,
|
| 55 |
+
title: str = Form(...),
|
| 56 |
+
file: UploadFile = File(...),
|
| 57 |
+
prompt: str = Form("Transcribe the Video. Write all the things described in the video"),
|
| 58 |
+
current_user = Depends(get_current_user)
|
| 59 |
+
):
|
| 60 |
+
"""
|
| 61 |
+
Upload a video file (max 20MB), transcribe via GenAI, and prepare the RAG system
|
| 62 |
+
"""
|
| 63 |
+
try:
|
| 64 |
+
contents = await file.read()
|
| 65 |
+
file_size = len(contents)
|
| 66 |
+
if file_size > 20 * 1024 * 1024:
|
| 67 |
+
raise HTTPException(status_code=400, detail="File size exceeds 20MB limit")
|
| 68 |
+
if not file.content_type.startswith('video/'):
|
| 69 |
+
raise HTTPException(status_code=400, detail="File must be a video")
|
| 70 |
+
|
| 71 |
+
client = init_google_client()
|
| 72 |
+
content = types.Content(
|
| 73 |
+
parts=[
|
| 74 |
+
types.Part(text=prompt),
|
| 75 |
+
types.Part(inline_data=types.Blob(data=contents, mime_type=file.content_type))
|
| 76 |
+
]
|
| 77 |
+
)
|
| 78 |
+
response = client.models.generate_content(
|
| 79 |
+
model='models/gemini-2.0-flash',
|
| 80 |
+
contents=content
|
| 81 |
+
)
|
| 82 |
+
transcription = response.candidates[0].content.parts[0].text
|
| 83 |
+
session_id = process_transcription(
|
| 84 |
+
transcription,
|
| 85 |
+
current_user.username,
|
| 86 |
+
title,
|
| 87 |
+
source_type="upload",
|
| 88 |
+
file_size=file_size
|
| 89 |
+
)
|
| 90 |
+
ext = os.path.splitext(file.filename)[1]
|
| 91 |
+
file_path = os.path.join(settings.VIDEOS_DIR, f"{session_id}{ext}")
|
| 92 |
+
background_tasks.add_task(save_video_file, session_id, file_path, contents)
|
| 93 |
+
return {"session_id": session_id, "message": "Uploaded video transcribed and RAG system prepared"}
|
| 94 |
+
|
| 95 |
+
except HTTPException:
|
| 96 |
+
raise
|
| 97 |
+
except Exception as e:
|
| 98 |
+
raise HTTPException(status_code=500, detail=f"Error processing uploaded video: {str(e)}")
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
@router.get("/download/{video_id}")
|
| 102 |
+
async def download_video(
|
| 103 |
+
video_id: str,
|
| 104 |
+
current_user = Depends(get_current_user)
|
| 105 |
+
):
|
| 106 |
+
"""
|
| 107 |
+
Download a previously uploaded video by streaming the saved file
|
| 108 |
+
"""
|
| 109 |
+
video_data = mongodb.videos.find_one({"video_id": video_id})
|
| 110 |
+
if not video_data:
|
| 111 |
+
raise HTTPException(status_code=404, detail="Video not found")
|
| 112 |
+
if video_data["user_id"] != current_user.username:
|
| 113 |
+
raise HTTPException(status_code=403, detail="Not authorized to access this video")
|
| 114 |
+
|
| 115 |
+
if video_data["source_type"] == "youtube":
|
| 116 |
+
return {"message": "This is a YouTube video. Access via:", "url": video_data["source_url"]}
|
| 117 |
+
|
| 118 |
+
files = [f for f in os.listdir(settings.VIDEOS_DIR) if f.startswith(video_id)]
|
| 119 |
+
if not files:
|
| 120 |
+
raise HTTPException(status_code=404, detail="Video file not found")
|
| 121 |
+
|
| 122 |
+
path = os.path.join(settings.VIDEOS_DIR, files[0])
|
| 123 |
+
def iterfile():
|
| 124 |
+
with open(path, 'rb') as f:
|
| 125 |
+
yield from f
|
| 126 |
+
mime_type = f"video/{os.path.splitext(files[0])[1][1:]}"
|
| 127 |
+
return StreamingResponse(
|
| 128 |
+
iterfile(),
|
| 129 |
+
media_type=mime_type,
|
| 130 |
+
headers={"Content-Disposition": f"attachment; filename={video_data['title']}{os.path.splitext(files[0])[1]}"}
|
| 131 |
+
)
|
app/services/__init__.py
ADDED
|
File without changes
|
app/services/__pycache__/__init__.cpython-312.pyc
ADDED
|
Binary file (169 Bytes). View file
|
|
|
app/services/__pycache__/auth.cpython-312.pyc
ADDED
|
Binary file (2.13 kB). View file
|
|
|
app/services/__pycache__/llm.cpython-312.pyc
ADDED
|
Binary file (2.95 kB). View file
|
|
|
app/services/__pycache__/transcription.cpython-312.pyc
ADDED
|
Binary file (3.51 kB). View file
|
|
|
app/services/auth.py
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from passlib.context import CryptContext
|
| 2 |
+
from datetime import datetime, timedelta
|
| 3 |
+
import jwt
|
| 4 |
+
from ..config import settings
|
| 5 |
+
from ..db.mongodb import mongodb
|
| 6 |
+
from ..models.user import UserInDB
|
| 7 |
+
|
| 8 |
+
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
|
| 9 |
+
|
| 10 |
+
def verify_password(plain, hashed):
|
| 11 |
+
return pwd_context.verify(plain, hashed)
|
| 12 |
+
|
| 13 |
+
def get_password_hash(password):
|
| 14 |
+
return pwd_context.hash(password)
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
def create_access_token(data: dict):
|
| 18 |
+
to_encode = data.copy()
|
| 19 |
+
expire = datetime.utcnow() + timedelta(minutes=settings.ACCESS_TOKEN_EXPIRE_MINUTES)
|
| 20 |
+
to_encode.update({"exp": expire})
|
| 21 |
+
return jwt.encode(to_encode, settings.SECRET_KEY, algorithm=settings.ALGORITHM)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def get_user(username: str):
|
| 25 |
+
user = mongodb.users.find_one({"username": username})
|
| 26 |
+
return UserInDB(**user) if user else None
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def authenticate_user(username: str, password: str):
|
| 30 |
+
user = get_user(username)
|
| 31 |
+
if not user or not verify_password(password, user.hashed_password):
|
| 32 |
+
return None
|
| 33 |
+
return user
|
app/services/llm.py
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
from google import genai
|
| 3 |
+
from google.genai import types
|
| 4 |
+
from .auth import settings
|
| 5 |
+
from langchain_groq import ChatGroq
|
| 6 |
+
from langchain_huggingface import HuggingFaceEmbeddings
|
| 7 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
| 8 |
+
from langchain_community.docstore.in_memory import InMemoryDocstore
|
| 9 |
+
from langchain_community.vectorstores import FAISS
|
| 10 |
+
from langchain.chains import ConversationalRetrievalChain
|
| 11 |
+
from langchain_core.prompts import ChatPromptTemplate
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def init_google_client():
|
| 15 |
+
api_key = os.getenv("GOOGLE_API_KEY")
|
| 16 |
+
if not api_key:
|
| 17 |
+
raise ValueError("GOOGLE_API_KEY not set")
|
| 18 |
+
return genai.Client(api_key=api_key)
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def get_llm():
|
| 22 |
+
api_key = os.getenv("CHATGROQ_API_KEY")
|
| 23 |
+
if not api_key:
|
| 24 |
+
raise ValueError("CHATGROQ_API_KEY not set")
|
| 25 |
+
return ChatGroq(model="meta-llama/llama-4-scout-17b-16e-instruct", temperature=0, max_tokens=1024, api_key=api_key)
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def get_embeddings():
|
| 29 |
+
return HuggingFaceEmbeddings(model_name="BAAI/bge-small-en", model_kwargs={"device": "cpu"}, encode_kwargs={"normalize_embeddings": True})
|
| 30 |
+
|
| 31 |
+
# reuse prompt template
|
| 32 |
+
prompt_template = """
|
| 33 |
+
You are an assistant specialized in solving quizzes. Your goal is to provide accurate, concise, and contextually relevant answers.
|
| 34 |
+
Use the following retrieved context to answer the user's question.
|
| 35 |
+
If the context lacks sufficient information, respond with "I don't know." Do not make up answers or provide unverified information.
|
| 36 |
+
|
| 37 |
+
Guidelines:
|
| 38 |
+
1. Extract key information from the context to form a coherent response.
|
| 39 |
+
2. Maintain a clear and professional tone.
|
| 40 |
+
3. If the question requires clarification, specify it politely.
|
| 41 |
+
|
| 42 |
+
Retrieved context:
|
| 43 |
+
{context}
|
| 44 |
+
|
| 45 |
+
User's question:
|
| 46 |
+
{question}
|
| 47 |
+
|
| 48 |
+
Your response:
|
| 49 |
+
"""
|
| 50 |
+
|
| 51 |
+
# Create a prompt template to pass the context and user input to the chain
|
| 52 |
+
user_prompt = ChatPromptTemplate.from_messages(
|
| 53 |
+
[
|
| 54 |
+
("system", prompt_template),
|
| 55 |
+
("human", "{question}"),
|
| 56 |
+
]
|
| 57 |
+
)
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def create_chain(retriever):
|
| 61 |
+
return ConversationalRetrievalChain.from_llm(
|
| 62 |
+
llm=get_llm(),
|
| 63 |
+
retriever=retriever,
|
| 64 |
+
return_source_documents=True,
|
| 65 |
+
chain_type='stuff',
|
| 66 |
+
combine_docs_chain_kwargs={"prompt": user_prompt},
|
| 67 |
+
verbose=False,
|
| 68 |
+
)
|
app/services/transcription.py
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app/services/transcription.py
|
| 2 |
+
import os
|
| 3 |
+
import uuid
|
| 4 |
+
from datetime import datetime
|
| 5 |
+
from fastapi import BackgroundTasks, HTTPException
|
| 6 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
| 7 |
+
from ..services.llm import get_embeddings
|
| 8 |
+
from ..config import settings
|
| 9 |
+
from ..db.mongodb import mongodb
|
| 10 |
+
from ..db.chat_manager import chat_manager
|
| 11 |
+
from langchain_community.vectorstores import FAISS
|
| 12 |
+
|
| 13 |
+
# ensure video dir exists
|
| 14 |
+
os.makedirs(settings.VIDEOS_DIR, exist_ok=True)
|
| 15 |
+
|
| 16 |
+
# Store text splits in MongoDB under "chunks" collection
|
| 17 |
+
chunks_collection = mongodb.db.get_collection("chunks")
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def process_transcription(transcription: str, user_id: str, title: str, source_type: str,
|
| 21 |
+
source_url: str = None, file_size: int = None) -> str:
|
| 22 |
+
"""
|
| 23 |
+
Split transcription into chunks, store in MongoDB, initialize chat history, and return session ID.
|
| 24 |
+
"""
|
| 25 |
+
# Split text
|
| 26 |
+
splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=20)
|
| 27 |
+
splits = splitter.split_text(transcription)
|
| 28 |
+
|
| 29 |
+
# Persist session metadata
|
| 30 |
+
session_id = str(uuid.uuid4())
|
| 31 |
+
mongodb.videos.insert_one({
|
| 32 |
+
"video_id": session_id,
|
| 33 |
+
"user_id": user_id,
|
| 34 |
+
"title": title,
|
| 35 |
+
"source_type": source_type,
|
| 36 |
+
"source_url": source_url,
|
| 37 |
+
"created_at": datetime.utcnow(),
|
| 38 |
+
"transcription": transcription,
|
| 39 |
+
"size": file_size
|
| 40 |
+
})
|
| 41 |
+
|
| 42 |
+
# Store chunks for retrieval
|
| 43 |
+
chunk_docs = [{"session_id": session_id, "text": chunk} for chunk in splits]
|
| 44 |
+
chunks_collection.insert_many(chunk_docs)
|
| 45 |
+
|
| 46 |
+
# Initialize chat history in Mongo
|
| 47 |
+
chat_manager.initialize_chat_history(session_id)
|
| 48 |
+
|
| 49 |
+
return session_id
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def get_retriever(session_id: str):
|
| 53 |
+
"""
|
| 54 |
+
Build a Retriever by loading chunks from MongoDB and creating a FAISS vectorstore.
|
| 55 |
+
"""
|
| 56 |
+
# Fetch stored text splits
|
| 57 |
+
docs = [doc["text"] for doc in chunks_collection.find({"session_id": session_id})]
|
| 58 |
+
if not docs:
|
| 59 |
+
raise HTTPException(status_code=404, detail="Session data not found. Please transcribe first.")
|
| 60 |
+
|
| 61 |
+
# Create embeddings and vectorstore
|
| 62 |
+
embeddings = get_embeddings()
|
| 63 |
+
vectorstore = FAISS.from_texts(docs, embeddings)
|
| 64 |
+
return vectorstore.as_retriever(search_kwargs={"k": 3})
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def save_video_file(video_id: str, file_path: str, contents: bytes) -> None:
|
| 68 |
+
"""
|
| 69 |
+
Persist the uploaded video file to disk.
|
| 70 |
+
"""
|
| 71 |
+
os.makedirs(os.path.dirname(file_path), exist_ok=True)
|
| 72 |
+
with open(file_path, "wb") as f:
|
| 73 |
+
f.write(contents)
|
app/utils/__init__.py
ADDED
|
File without changes
|
app/utils/helpers.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Generic helper functions
|
| 2 |
+
|
| 3 |
+
def chunk_list(lst, size):
|
| 4 |
+
"""Yield successive chunks from list."""
|
| 5 |
+
for i in range(0, len(lst), size):
|
| 6 |
+
yield lst[i:i+size]
|
asgi.py
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# asgi.py
|
| 2 |
+
from app.main import app
|
requirements.txt
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
aiohappyeyeballs==2.6.1
|
| 2 |
+
aiohttp==3.12.13
|
| 3 |
+
aiosignal==1.4.0
|
| 4 |
+
annotated-types==0.7.0
|
| 5 |
+
anyio==4.9.0
|
| 6 |
+
arrow==1.3.0
|
| 7 |
+
attrs==25.3.0
|
| 8 |
+
bcrypt==4.3.0
|
| 9 |
+
cachetools==5.5.2
|
| 10 |
+
certifi==2025.6.15
|
| 11 |
+
cffi==1.17.1
|
| 12 |
+
charset-normalizer==3.4.2
|
| 13 |
+
circuitbreaker==2.1.3
|
| 14 |
+
click==8.0.4
|
| 15 |
+
cryptography==44.0.3
|
| 16 |
+
dataclasses-json==0.6.7
|
| 17 |
+
distro==1.9.0
|
| 18 |
+
dnspython==2.7.0
|
| 19 |
+
email_validator==2.2.0
|
| 20 |
+
faiss-cpu==1.11.0
|
| 21 |
+
fastapi==0.116.0
|
| 22 |
+
filelock==3.18.0
|
| 23 |
+
frozenlist==1.7.0
|
| 24 |
+
fsspec==2025.5.1
|
| 25 |
+
google-auth==2.40.3
|
| 26 |
+
google-genai==1.24.0
|
| 27 |
+
greenlet==3.2.3
|
| 28 |
+
groq==0.29.0
|
| 29 |
+
h11==0.16.0
|
| 30 |
+
hf-xet==1.1.5
|
| 31 |
+
httpcore==1.0.9
|
| 32 |
+
httpx==0.28.1
|
| 33 |
+
httpx-sse==0.4.1
|
| 34 |
+
huggingface-hub==0.33.2
|
| 35 |
+
idna==3.10
|
| 36 |
+
Jinja2==3.1.6
|
| 37 |
+
jmespath==0.10.0
|
| 38 |
+
joblib==1.5.1
|
| 39 |
+
jsonpatch==1.33
|
| 40 |
+
jsonpointer==3.0.0
|
| 41 |
+
langchain==0.3.26
|
| 42 |
+
langchain-community==0.3.27
|
| 43 |
+
langchain-core==0.3.68
|
| 44 |
+
langchain-groq==0.3.5
|
| 45 |
+
langchain-huggingface==0.3.0
|
| 46 |
+
langchain-mongodb==0.6.2
|
| 47 |
+
langchain-text-splitters==0.3.8
|
| 48 |
+
langsmith==0.4.4
|
| 49 |
+
lark==1.2.2
|
| 50 |
+
MarkupSafe==3.0.2
|
| 51 |
+
marshmallow==3.26.1
|
| 52 |
+
mpmath==1.3.0
|
| 53 |
+
multidict==6.6.3
|
| 54 |
+
mypy_extensions==1.1.0
|
| 55 |
+
networkx==3.5
|
| 56 |
+
numpy==2.3.1
|
| 57 |
+
nvidia-cublas-cu12==12.6.4.1
|
| 58 |
+
nvidia-cuda-cupti-cu12==12.6.80
|
| 59 |
+
nvidia-cuda-nvrtc-cu12==12.6.77
|
| 60 |
+
nvidia-cuda-runtime-cu12==12.6.77
|
| 61 |
+
nvidia-cudnn-cu12==9.5.1.17
|
| 62 |
+
nvidia-cufft-cu12==11.3.0.4
|
| 63 |
+
nvidia-cufile-cu12==1.11.1.6
|
| 64 |
+
nvidia-curand-cu12==10.3.7.77
|
| 65 |
+
nvidia-cusolver-cu12==11.7.1.2
|
| 66 |
+
nvidia-cusparse-cu12==12.5.4.2
|
| 67 |
+
nvidia-cusparselt-cu12==0.6.3
|
| 68 |
+
nvidia-nccl-cu12==2.26.2
|
| 69 |
+
nvidia-nvjitlink-cu12==12.6.85
|
| 70 |
+
nvidia-nvtx-cu12==12.6.77
|
| 71 |
+
oci==2.155.0
|
| 72 |
+
oci-cli==3.62.0
|
| 73 |
+
orjson==3.10.18
|
| 74 |
+
packaging==24.2
|
| 75 |
+
passlib==1.7.4
|
| 76 |
+
pillow==11.3.0
|
| 77 |
+
prompt-toolkit==3.0.43
|
| 78 |
+
propcache==0.3.2
|
| 79 |
+
pyasn1==0.6.1
|
| 80 |
+
pyasn1_modules==0.4.2
|
| 81 |
+
pycparser==2.22
|
| 82 |
+
pydantic==2.11.7
|
| 83 |
+
pydantic-settings==2.10.1
|
| 84 |
+
pydantic_core==2.33.2
|
| 85 |
+
PyJWT==2.10.1
|
| 86 |
+
pymongo==4.13.2
|
| 87 |
+
pyOpenSSL==24.3.0
|
| 88 |
+
python-dateutil==2.9.0.post0
|
| 89 |
+
python-dotenv==1.1.1
|
| 90 |
+
python-multipart==0.0.20
|
| 91 |
+
pytz==2025.2
|
| 92 |
+
PyYAML==6.0.2
|
| 93 |
+
regex==2024.11.6
|
| 94 |
+
requests==2.32.4
|
| 95 |
+
requests-toolbelt==1.0.0
|
| 96 |
+
rsa==4.9.1
|
| 97 |
+
safetensors==0.5.3
|
| 98 |
+
scikit-learn==1.7.0
|
| 99 |
+
scipy==1.16.0
|
| 100 |
+
sentence-transformers==5.0.0
|
| 101 |
+
setuptools==80.9.0
|
| 102 |
+
six==1.17.0
|
| 103 |
+
sniffio==1.3.1
|
| 104 |
+
SQLAlchemy==2.0.41
|
| 105 |
+
starlette==0.46.2
|
| 106 |
+
sympy==1.14.0
|
| 107 |
+
tenacity==8.5.0
|
| 108 |
+
terminaltables==3.1.10
|
| 109 |
+
threadpoolctl==3.6.0
|
| 110 |
+
tokenizers==0.21.2
|
| 111 |
+
torch==2.7.1
|
| 112 |
+
tqdm==4.67.1
|
| 113 |
+
transformers==4.53.1
|
| 114 |
+
triton==3.3.1
|
| 115 |
+
types-python-dateutil==2.9.0.20250516
|
| 116 |
+
typing-inspect==0.9.0
|
| 117 |
+
typing-inspection==0.4.1
|
| 118 |
+
typing_extensions==4.14.1
|
| 119 |
+
urllib3==2.5.0
|
| 120 |
+
uvicorn==0.35.0
|
| 121 |
+
wcwidth==0.2.13
|
| 122 |
+
websockets==15.0.1
|
| 123 |
+
yarl==1.20.1
|
| 124 |
+
zstandard==0.23.0
|
vercel.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"version": 2,
|
| 3 |
+
"builds": [
|
| 4 |
+
{ "src": "asgi.py", "use": "@vercel/python" }
|
| 5 |
+
],
|
| 6 |
+
"routes": [
|
| 7 |
+
{ "src": "/(.*)", "dest": "asgi.py" }
|
| 8 |
+
]
|
| 9 |
+
}
|