vision_llm_agent / README.md
sunheycho
Update README to use GitHub demo.gif URL
f999852
---
title: Vision Llm Agent
emoji: πŸŒ–
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: gpl-3.0
---
## 🎬 Demo
![Vision LLM Agent Demo](https://github.com/HGboda/vision-web-app/raw/master/demo.gif)
*Live demo showing product comparison analysis with image upload, real-time processing, and detailed results across multiple analysis tabs.*
# Vision LLM Agent - Object Detection with AI Assistant
A multi-model object detection and image classification demo with LLM-based AI assistant for answering questions about detected objects. This project uses YOLOv8, DETR, and ViT models for vision tasks, and TinyLlama for natural language processing. The application includes a secure login system to protect access to the AI features.
## Project Architecture
This project follows a phased development approach:
### Phase 0: PoC with Gradio (Original)
- Simple Gradio interface with multiple object detection models
- Uses Hugging Face's free tier for model hosting
- Easy to deploy to Hugging Face Spaces
### Phase 1: Service Separation (Implemented)
- Backend: Flask API with model inference endpoints
- REST API endpoints for model inference
- JSON responses with detection results and performance metrics
### Phase 2: UI Upgrade (Implemented)
- Modern React frontend with Material-UI components
- Improved user experience with responsive design
- Separate frontend and backend architecture
### Phase 3: CI/CD & Testing (Planned)
- GitHub Actions for automated testing and deployment
- Comprehensive test suite with pytest and ESLint
- Automatic rebuilds on Hugging Face Spaces
## How to Run
### Option 1: Original Gradio App
1. Install dependencies:
```bash
pip install -r requirements.txt
```
2. Run the Gradio app:
```bash
python app.py
```
3. Open your browser and go to the URL shown in the terminal (typically `http://127.0.0.1:7860`)
### Option 2: React Frontend with Flask Backend
1. Install backend dependencies:
```bash
pip install -r requirements.txt
```
2. Start the Flask backend server:
```bash
python api.py
```
3. In a separate terminal, navigate to the frontend directory:
```bash
cd frontend
```
4. Install frontend dependencies:
```bash
npm install
```
5. Start the React development server:
```bash
npm start
```
6. Open your browser and go to `http://localhost:3000`
## Models Used
- **YOLOv8**: Fast and accurate object detection
- **DETR**: DEtection TRansformer for object detection
- **ViT**: Vision Transformer for image classification
- **TinyLlama**: For natural language processing and question answering about detected objects
## Authentication
The application includes a secure login system to protect access to all features:
- **Default Credentials**:
- Username: `admin` / Password: `admin123`
- Username: `user` / Password: `user123`
- **Login Process**:
- All routes and API endpoints are protected with Flask-Login
- Users must authenticate before accessing any features
- Session management handles login state persistence
- **Security Features**:
- Password protection for all API endpoints and UI pages
- Session-based authentication with secure cookies
- Configurable secret key via environment variables
## API Endpoints
The Flask backend provides the following API endpoints (all require authentication):
- `GET /api/status` - Check the status of the API and available models
- `POST /api/detect/yolo` - Detect objects using YOLOv8
- `POST /api/detect/detr` - Detect objects using DETR
- `POST /api/classify/vit` - Classify images using ViT
- `POST /api/analyze` - Analyze images with LLM assistant
- `POST /api/similar-images` - Find similar images in the vector database
- `POST /api/add-to-collection` - Add images to the vector database
- `POST /api/add-detected-objects` - Add detected objects to the vector database
- `POST /api/search-similar-objects` - Search for similar objects in the vector database
All POST endpoints accept form data with an 'image' field containing the image file.
## Deployment
### Gradio App
The Gradio app is designed to be easily deployed to Hugging Face Spaces:
1. Create a new Space on Hugging Face
2. Select Gradio as the SDK
3. Push this repository to the Space's git repository
4. The app will automatically deploy
### React + Flask App
For the React + Flask version, you'll need to:
1. Build the React frontend:
```bash
cd frontend
npm run build
```
2. Serve the static files from a web server or cloud hosting service
3. Deploy the Flask backend to a server that supports Python