StreamWiseAI / README.md
rajesh1804's picture
updated readme
9b6deb9
---
title: "StreamWiseAI"
emoji: "🎬"
colorFrom: "blue"
colorTo: "red"
sdk: streamlit
sdk_version: "1.33.0"
app_file: app.py
pinned: false
---
# 🎬 StreamWiseAI β€” Personalized Movie Recommender & Retention Coach
[![Built with Streamlit](https://img.shields.io/badge/Built%20with-Streamlit-red?logo=streamlit)](https://streamlit.io)
[![Semantic Search + AI Agent](https://img.shields.io/badge/AI-SentenceTransformers%2C%20OpenRouter-blue?logo=OpenAI)](https://www.sbert.net)
[![Deployment: Hugging Face Spaces](https://img.shields.io/badge/Deployed%20on-HuggingFace-orange?logo=huggingface)](https://huggingface.co/spaces/rajesh1804/StreamWiseAI)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
> 🎯 **StreamWiseAI** is a smart movie recommendation engine that combines semantic search with an AI-powered Retention Coach. Built to mimic the intelligence of Netflix-style recommender pipelines β€” but fully local, transparent, and recruiter-grade.
πŸ”— **Live Demo**: [Try on Hugging Face Spaces](https://huggingface.co/spaces/rajesh1804/streamwiseai)
πŸ“„ **Medium Article**: [Click here](https://rajesh1804.medium.com/streamwiseai-an-ai-powered-movie-recommender-with-a-retention-coach-agent-b9b54319805f)
πŸ“Œ **Built by**: [Rajesh Marudhachalam](https://www.linkedin.com/in/rajesh1804/)
---
## 🧠 What It Solves
_"What should I watch next… and why will I like it?"_
Streaming platforms have endless content, but not enough **contextual guidance**.
StreamWiseAI solves this with:
- 🎯 Semantic search based on movie themes + overviews
- 🧠 Personalized tips from an **AI Retention Coach Agent**
- πŸ•΅οΈ Session-aware user history to make better future suggestions
> βœ… Makes your recommender not just smart β€” but **explainable**.
---
## πŸ”§ Features
βœ… **Semantic Movie Recommender** using Sentence-BERT
🧠 **AI Retention Coach Agent** via OpenRouter LLM API
🎯 **Fuzzy Title Matching** for typo-tolerant search
πŸ‘€ **Session-aware Viewing History** to personalize experience
πŸ–ΌοΈ **Dynamic Poster & Overview UI** with genre highlights
🧰 **Deployable on Hugging Face Spaces** (free-tier compatible)
πŸ’¬ **Natural Language Tips** for continued user engagement
βš™οΈ **Modular Codebase** for ML, UI, and agent separation
---
## πŸ–ΌοΈ Architecture Overview
<p align="center">
<img src="https://github.com/rajesh1804/streamwiseai/raw/main/assets/streamwiseai-architecture.png" alt="Architecture Overview" width="600"/>
</p>
```text
πŸ“‚ Raw Data Sources
β”œβ”€β”€ πŸ“„ MovieLens Ratings + Titles (CSV)
└── πŸ“„ TMDb Metadata (Genres, Posters, Overview)
⬇️ Data Enrichment Pipeline (Fuzzy Matching + Merging)
β”œβ”€β”€ βœ… Title Normalization
β”œβ”€β”€ 🧩 FuzzyWuzzy Matching with Year Filter
β”œβ”€β”€ πŸ”„ Genre Merge (MovieLens + TMDb)
└── πŸ“¦ Output: movies_enriched.csv
⬇️ Embedding Generation
β”œβ”€β”€ πŸ”€ Input Text = "Title + Genres + Overview"
β”œβ”€β”€ 🧠 Model: all-MiniLM-L6-v2 (Sentence-BERT)
└── πŸ’Ύ Output: movie_embeddings.npz (SBERT vectors)
⬇️ Recommender Engine (scripts/recommender.py)
β”œβ”€β”€ πŸ” Fuzzy Match Input Title
β”œβ”€β”€ πŸ“ˆ Cosine Similarity with Embedding Store
└── 🎯 Top K Semantic Neighbors (Vector Search)
⬇️ LLM Agent (agent.py)
β”œβ”€β”€ πŸ€– Prompt Built from Input + Rec Results
β”œβ”€β”€ πŸ“¬ LLM: Mistral-7B via OpenRouter (Free)
└── 🧠 Output: Personalized Retention Tip
⬇️ Streamlit UI (app.py)
β”œβ”€β”€ 🧠 Input Box with Session Watch History
β”œβ”€β”€ 🎬 Recommendations with Posters, Genres, Overview
β”œβ”€β”€ πŸ’‘ LLM Insight Box with Retry Logic
└── πŸš€ Deployed on Hugging Face Spaces
```
---
## πŸ’‘ Retention Coach Agent
> ✨ An AI β€œContent Coach” that explains why you’ll enjoy a movie β€” like Netflix’s internal behavior models.
The **Retention Coach Agent** reads the user’s selected movie and top 5 recommendations, then produces:
- A 1–2 line content insight (e.g., β€œYou enjoy nostalgic animated journeys about friendship.”)
- A contextual tip to keep users engaged
Powered by [OpenRouter](https://openrouter.ai) + [Mistral-7B-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct).
---
## πŸ§ͺ Example Flow
1. User searches: `"batman"`
2. App fuzzy-matches and embeds input query
3. App recommends:
- πŸŽ₯ Similar animated or nostalgic titles
- 🎭 Semantic match based on overview and genre
4. πŸ’‘ Retention coach suggests:
_"You seem to enjoy dark, vigilante-style thrillers. You may also love intense detective mysteries or neo-noir stories!"_
5. πŸ“– User’s search history is visible under a collapsible list
---
## πŸ“½οΈ Live Demo
πŸš€ **Highlights**
- Vector Search + Fuzzy Matching for smart retrieval
- OpenRouter LLM Agent for content insights
- Streamlit UI with dynamic posters, search memory, retry logic
- Production-ready, deployed on Hugging Face (free-tier)
πŸ‘‰ Try it on [Hugging Face Spaces](https://huggingface.co/spaces/rajesh1804/StreamWiseAI)
<p align="center">
<img src="https://github.com/rajesh1804/streamwiseai/raw/main/assets/streamwiseai-demo.gif" alt="Demo" width="800"/>
</p>
---
## πŸ“Š Evaluation & Observability
- Cosine similarity is printed in the sidebar for each match
- LLM latency and retries are handled gracefully
- Embedding search latency ~150ms locally
> βœ… Future versions can log latency and similarity per session
---
## 🧠 How It Works – Under the Hood of StreamWiseAI
StreamWiseAI blends NLP and AI agents to simulate the intelligence behind modern streaming platforms.
### 🎯 1. Semantic Movie Matching
We use Sentence-BERT embeddings trained on movie overviews + genre metadata to create rich vector representations.
- Title search is fuzzy-matched
- Query is encoded dynamically
- Cosine similarity is used to find nearest movies
> βœ… Why it matters: Simulates how streaming platforms serve similar content even with vague input.
### 🧠 2. AI Agent Retention Coach
Once recommendations are shown, an OpenRouter LLM (e.g. Mistral) analyzes the results and suggests a short retention insight.
> βœ… Why it matters: Simulates Netflix’s behavior analysis and proactive engagement.
### πŸ—ƒοΈ 3. Session-aware Search History
Each user session stores past movie searches, optionally used to inform recommendations and insights.
> βœ… Why it matters: Demonstrates personalization + memory.
---
## πŸš€ Getting Started Locally
### 1. Clone the repo
```bash
git clone https://github.com/rajesh1804/StreamWiseAI.git
cd StreamWiseAI
```
### 2. Setup Python 3.10 (Recommended)
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
You’ll need:
- requests==2.31.0
- sentence-transformers==2.2.2
- streamlit==1.33.0
- tenacity
- python-dotenv
### 4. Add `.env` file
Create a `.env` file with:
```ini
OPENROUTER_API_KEY=your_api_key_here
```
### 5. Run App
```bash
streamlit run app.py
```
---
## 🧠 Why This Project Matters
Modern recommender systems go beyond just content β€” they understand context, preferences, and attention. StreamWiseAI is designed to simulate this *product intelligence* by combining:
- 🧠 **NLP + Semantic Vectors** for real-time similarity search
- πŸ€– **LLM Agents** that summarize user preferences
- πŸ’‘ **Personalized UI experience** powered by session memory
> 🎯 It’s not just about building a recommender β€” it’s about building a **smart product**.
---
## πŸ› οΈ Tech Stack
| Layer | Technology |
|------------------|------------|
| UI | Streamlit |
| Embeddings | sentence-transformers (MiniLM-L6-v2) |
| Vector Search | Cosine Similarity via `util.cos_sim` |
| AI Agent | OpenRouter β†’ Mistral-7B (Free-tier LLM) |
| Data Enrichment | MovieLens + TMDb metadata |
| Fuzzy Matching | `difflib`, `fuzzywuzzy` |
| Deployment | Hugging Face Spaces (Free tier) |
---
## πŸ“ Project Structure
```scss
StreamWiseAI/
β”œβ”€β”€ app.py # Streamlit app entrypoint
β”œβ”€β”€ agent.py # Retention Coach logic
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ enrich_movies_with_metadata.py
β”‚ β”œβ”€β”€ generate_embeddings.py
β”‚ └── recommender.py
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/ # Raw MovieLens + TMDb data
β”‚ └── processed/ # Enriched CSV + Embeddings
β”œβ”€β”€ requirements.txt
└── README.md
```
---
## 🎯 Skills Demonstrated
βœ… Vector-based semantic retrieval using Sentence-BERT
βœ… LLM integration via OpenRouter API (zero-cost agent)
βœ… Prompt engineering for retention coaching
βœ… End-to-end ML product thinking: dataset β†’ model β†’ UI β†’ deploy
βœ… Tenacity-based retry/backoff for production resilience
βœ… Personalized search memory via session history
βœ… Deployment on Hugging Face Spaces (no servers!)
---
## πŸ“Œ About Me
I'm **Rajesh**, an AI/ML Engineer with a passion for building real-world, **product-grade AI systems**.
This project is part of a portfolio that simulates how top tech companies (like Netflix, Uber, Instacart, Reddit) embed AI deeply into their product workflows.
πŸ“§ [rajesh.marudhachalam@gmail.com](mailto:rajesh.marudhachalam@gmail.com)
πŸ”— [LinkedIn](https://www.linkedin.com/in/rajesh1804/)
> πŸ’Ό "**Hire Rajesh** – Build AI like a product, not just a model."
---
## πŸ™Œ Acknowledgments
- [MovieLens Dataset](https://grouplens.org/datasets/movielens/)
- [TMDb Metadata](https://www.themoviedb.org/)
- [OpenRouter](https://openrouter.ai) for LLM APIs
- [Hugging Face Spaces](https://huggingface.co/spaces) for deployment
---
## πŸ“£ Other Projects
| Project | Domain | Highlights |
|--------|--------|------------|
| [πŸ›’ GroceryGPT+](https://huggingface.co/spaces/rajesh1804/grocerygpt) | Grocery | Vector Search + LLM Reranking |
| [πŸš— RideCastAI](https://huggingface.co/spaces/rajesh1804/ridecastai) | Ride-hailing | ETA + Fare Prediction |
| [🎬 StreamWiseAI](https://huggingface.co/spaces/rajesh1804/streamwiseai) | Streaming | Recommendations + Retention Agent |
---
⭐️ *Star this repo if you liked it. Follow me for more AI-native product builds!*