Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.51.0
title: StreamWiseAI
emoji: π¬
colorFrom: blue
colorTo: red
sdk: streamlit
sdk_version: 1.33.0
app_file: app.py
pinned: false
π¬ StreamWiseAI β Personalized Movie Recommender & Retention Coach
π― StreamWiseAI is a smart movie recommendation engine that combines semantic search with an AI-powered Retention Coach. Built to mimic the intelligence of Netflix-style recommender pipelines β but fully local, transparent, and recruiter-grade.
π Live Demo: Try on Hugging Face Spaces
π Medium Article: Click here
π Built by: Rajesh Marudhachalam
π§ What It Solves
"What should I watch next⦠and why will I like it?"
Streaming platforms have endless content, but not enough contextual guidance.
StreamWiseAI solves this with:
- π― Semantic search based on movie themes + overviews
- π§ Personalized tips from an AI Retention Coach Agent
- π΅οΈ Session-aware user history to make better future suggestions
β Makes your recommender not just smart β but explainable.
π§ Features
β
Semantic Movie Recommender using Sentence-BERT
π§ AI Retention Coach Agent via OpenRouter LLM API
π― Fuzzy Title Matching for typo-tolerant search
π Session-aware Viewing History to personalize experience
πΌοΈ Dynamic Poster & Overview UI with genre highlights
π§° Deployable on Hugging Face Spaces (free-tier compatible)
π¬ Natural Language Tips for continued user engagement
βοΈ Modular Codebase for ML, UI, and agent separation
πΌοΈ Architecture Overview
π Raw Data Sources
βββ π MovieLens Ratings + Titles (CSV)
βββ π TMDb Metadata (Genres, Posters, Overview)
β¬οΈ Data Enrichment Pipeline (Fuzzy Matching + Merging)
βββ β
Title Normalization
βββ π§© FuzzyWuzzy Matching with Year Filter
βββ π Genre Merge (MovieLens + TMDb)
βββ π¦ Output: movies_enriched.csv
β¬οΈ Embedding Generation
βββ π€ Input Text = "Title + Genres + Overview"
βββ π§ Model: all-MiniLM-L6-v2 (Sentence-BERT)
βββ πΎ Output: movie_embeddings.npz (SBERT vectors)
β¬οΈ Recommender Engine (scripts/recommender.py)
βββ π Fuzzy Match Input Title
βββ π Cosine Similarity with Embedding Store
βββ π― Top K Semantic Neighbors (Vector Search)
β¬οΈ LLM Agent (agent.py)
βββ π€ Prompt Built from Input + Rec Results
βββ π¬ LLM: Mistral-7B via OpenRouter (Free)
βββ π§ Output: Personalized Retention Tip
β¬οΈ Streamlit UI (app.py)
βββ π§ Input Box with Session Watch History
βββ π¬ Recommendations with Posters, Genres, Overview
βββ π‘ LLM Insight Box with Retry Logic
βββ π Deployed on Hugging Face Spaces
π‘ Retention Coach Agent
β¨ An AI βContent Coachβ that explains why youβll enjoy a movie β like Netflixβs internal behavior models.
The Retention Coach Agent reads the userβs selected movie and top 5 recommendations, then produces:
- A 1β2 line content insight (e.g., βYou enjoy nostalgic animated journeys about friendship.β)
- A contextual tip to keep users engaged
Powered by OpenRouter + Mistral-7B-Instruct.
π§ͺ Example Flow
- User searches:
"batman" - App fuzzy-matches and embeds input query
- App recommends:
- π₯ Similar animated or nostalgic titles
- π Semantic match based on overview and genre
- π‘ Retention coach suggests:
"You seem to enjoy dark, vigilante-style thrillers. You may also love intense detective mysteries or neo-noir stories!" - π Userβs search history is visible under a collapsible list
π½οΈ Live Demo
π Highlights
- Vector Search + Fuzzy Matching for smart retrieval
- OpenRouter LLM Agent for content insights
- Streamlit UI with dynamic posters, search memory, retry logic
- Production-ready, deployed on Hugging Face (free-tier)
π Try it on Hugging Face Spaces
π Evaluation & Observability
- Cosine similarity is printed in the sidebar for each match
- LLM latency and retries are handled gracefully
- Embedding search latency ~150ms locally
β Future versions can log latency and similarity per session
π§ How It Works β Under the Hood of StreamWiseAI
StreamWiseAI blends NLP and AI agents to simulate the intelligence behind modern streaming platforms.
π― 1. Semantic Movie Matching
We use Sentence-BERT embeddings trained on movie overviews + genre metadata to create rich vector representations.
- Title search is fuzzy-matched
- Query is encoded dynamically
- Cosine similarity is used to find nearest movies
β Why it matters: Simulates how streaming platforms serve similar content even with vague input.
π§ 2. AI Agent Retention Coach
Once recommendations are shown, an OpenRouter LLM (e.g. Mistral) analyzes the results and suggests a short retention insight.
β Why it matters: Simulates Netflixβs behavior analysis and proactive engagement.
ποΈ 3. Session-aware Search History
Each user session stores past movie searches, optionally used to inform recommendations and insights.
β Why it matters: Demonstrates personalization + memory.
π Getting Started Locally
1. Clone the repo
git clone https://github.com/rajesh1804/StreamWiseAI.git
cd StreamWiseAI
2. Setup Python 3.10 (Recommended)
3. Install Dependencies
pip install -r requirements.txt
Youβll need:
- requests==2.31.0
- sentence-transformers==2.2.2
- streamlit==1.33.0
- tenacity
- python-dotenv
4. Add .env file
Create a .env file with:
OPENROUTER_API_KEY=your_api_key_here
5. Run App
streamlit run app.py
π§ Why This Project Matters
Modern recommender systems go beyond just content β they understand context, preferences, and attention. StreamWiseAI is designed to simulate this product intelligence by combining:
- π§ NLP + Semantic Vectors for real-time similarity search
- π€ LLM Agents that summarize user preferences
- π‘ Personalized UI experience powered by session memory
π― Itβs not just about building a recommender β itβs about building a smart product.
π οΈ Tech Stack
| Layer | Technology |
|---|---|
| UI | Streamlit |
| Embeddings | sentence-transformers (MiniLM-L6-v2) |
| Vector Search | Cosine Similarity via util.cos_sim |
| AI Agent | OpenRouter β Mistral-7B (Free-tier LLM) |
| Data Enrichment | MovieLens + TMDb metadata |
| Fuzzy Matching | difflib, fuzzywuzzy |
| Deployment | Hugging Face Spaces (Free tier) |
π Project Structure
StreamWiseAI/
βββ app.py # Streamlit app entrypoint
βββ agent.py # Retention Coach logic
βββ scripts/
β βββ enrich_movies_with_metadata.py
β βββ generate_embeddings.py
β βββ recommender.py
βββ data/
β βββ raw/ # Raw MovieLens + TMDb data
β βββ processed/ # Enriched CSV + Embeddings
βββ requirements.txt
βββ README.md
π― Skills Demonstrated
β
Vector-based semantic retrieval using Sentence-BERT
β
LLM integration via OpenRouter API (zero-cost agent)
β
Prompt engineering for retention coaching
β
End-to-end ML product thinking: dataset β model β UI β deploy
β
Tenacity-based retry/backoff for production resilience
β
Personalized search memory via session history
β
Deployment on Hugging Face Spaces (no servers!)
π About Me
I'm Rajesh, an AI/ML Engineer with a passion for building real-world, product-grade AI systems.
This project is part of a portfolio that simulates how top tech companies (like Netflix, Uber, Instacart, Reddit) embed AI deeply into their product workflows.
π§ rajesh.marudhachalam@gmail.com
π LinkedIn
πΌ "Hire Rajesh β Build AI like a product, not just a model."
π Acknowledgments
- MovieLens Dataset
- TMDb Metadata
- OpenRouter for LLM APIs
- Hugging Face Spaces for deployment
π£ Other Projects
| Project | Domain | Highlights |
|---|---|---|
| π GroceryGPT+ | Grocery | Vector Search + LLM Reranking |
| π RideCastAI | Ride-hailing | ETA + Fare Prediction |
| π¬ StreamWiseAI | Streaming | Recommendations + Retention Agent |
βοΈ Star this repo if you liked it. Follow me for more AI-native product builds!