Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.51.0
title: IoT Sensor Data RAG for Smart Buildings
emoji: π’
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.42.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
IoT Sensor Data RAG for Smart Buildings
π’ Problem Statement
Create a RAG system that processes IoT sensor data, maintenance manuals, and building specifications to provide predictive maintenance insights and operational optimization.
π― Key Requirements
- β IoT sensor data ingestion and real-time processing
- β Maintenance manual and building specification integration
- β Predictive maintenance algorithm implementation
- β Operational efficiency optimization recommendations
- β Anomaly detection and alert systems
π Technical Challenges Solved
- β Real-time sensor data streaming and processing
- β Multi-sensor data fusion and correlation
- β Predictive modeling for equipment failure
- β Building system integration and compatibility
- β Energy efficiency optimization algorithms
ποΈ System Architecture
Core Components
- RAG Engine: Vector database (ChromaDB) with Sentence-Transformers embeddings
- IoT Data Processor: Real-time sensor data streaming and anomaly detection
- Predictive Analytics: Equipment failure prediction and maintenance recommendations
- Document Intelligence: PDF/TXT processing with smart chunking strategies
- Web Interface: Modern Streamlit dashboard with Material design theme
Technology Stack
- Backend: Python, Streamlit, ChromaDB
- Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)
- Vector Database: ChromaDB with cosine similarity
- LLM Integration: Local Transformers + OpenAI API (optional)
- Data Processing: Pandas, NumPy, Scikit-learn
- Visualization: Plotly for real-time sensor monitoring
π Features
1. Real-Time IoT Monitoring
- Live sensor data streaming simulation
- Multi-sensor data fusion (temperature, humidity, power consumption)
- Real-time anomaly detection using rolling z-score analysis
- Interactive time-series visualizations
2. Intelligent Document RAG
- PDF and TXT document ingestion
- Smart text chunking (500 tokens with 50 token overlap)
- Context-aware retrieval using vector similarity
- Source attribution and relevance scoring
3. Predictive Maintenance
- Equipment failure prediction algorithms
- Maintenance schedule optimization
- Energy efficiency recommendations
- Anomaly-based alert systems
4. Evaluation & Analytics
- Retrieval accuracy metrics
- Response latency measurement
- Document relevance scoring
- System performance monitoring
π Quick Start
Prerequisites
- Python 3.8+
- 8GB+ RAM (for local LLM models)
- Internet connection (for initial model downloads)
Installation
# Clone the repository
git clone https://github.com/itsnewcoder/iot-smart-building-rag.git
cd iot-smart-building-rag
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt
Configuration
Create a .env file in the root directory (optional):
OPENAI_API_KEY=your_openai_api_key_here
Run Locally
streamlit run app.py
Access your app at: http://localhost:8501
π Project Structure
iot-smart-building-rag/
βββ app.py # Main Streamlit application
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .streamlit/
β βββ config.toml # Streamlit theme configuration
βββ rag/ # RAG system core
β βββ __init__.py
β βββ ingest.py # Document ingestion & vector store
β βββ retrieval.py # Context retrieval engine
β βββ generate.py # LLM response generation
β βββ evaluate.py # System evaluation metrics
βββ models/ # Predictive models
β βββ __init__.py
β βββ predictive.py # Anomaly detection & maintenance
βββ data/ # Sample data
β βββ manuals/ # Maintenance manuals (PDF/TXT)
β βββ specs/ # Building specifications
β βββ sensors/ # IoT sensor data (CSV)
βββ .chroma/ # Vector database storage
π§ Usage Guide
1. Dashboard Tab
- Start Stream: Begin real-time sensor data simulation
- Live Monitoring: View real-time sensor readings and trends
- Anomaly Detection: See detected anomalies with z-score analysis
- Maintenance Tips: Get AI-powered maintenance recommendations
2. RAG QA Tab
- Ask Questions: Query maintenance procedures and building specs
- Context Retrieval: View relevant document chunks and sources
- AI Responses: Get context-aware answers from local or OpenAI models
3. Evaluation Tab
- Retrieval Testing: Test system with custom queries
- Performance Metrics: View latency and relevance scores
- Quality Assessment: Evaluate RAG system effectiveness
4. Data Manager Tab
- Document Index: View indexed documents and sources
- File Upload: Add new PDFs/TXTs to the knowledge base
- Vector Store: Manage document embeddings and storage
π Sample Queries
Try these example questions in the RAG QA tab:
- "How to reset chiller pump?"
- "What are the fault codes for HVAC systems?"
- "How to maintain building temperature sensors?"
- "What are the power consumption optimization tips?"
- "How to troubleshoot humidity sensor issues?"
π― Evaluation Metrics
Retrieval Quality
- Relevance Scoring: Cosine similarity-based ranking
- Source Attribution: Document source tracking
- Context Retrieval: Top-k document retrieval
Performance Metrics
- Response Latency: End-to-end query processing time
- Throughput: Queries processed per second
- Memory Usage: Vector database storage efficiency
RAG Effectiveness
- Context Relevance: Retrieved document quality
- Answer Accuracy: Response relevance to queries
- Source Diversity: Multiple document source utilization
π Deployment
HuggingFace Spaces (Recommended)
- Create new Space at huggingface.co/spaces
- Choose Streamlit as SDK
- Upload project files
- Set environment variables in Space settings
Streamlit Cloud
- Push code to GitHub
- Connect repository at share.streamlit.io
- Deploy automatically
Local Deployment
# Production server
streamlit run app.py --server.port 8501 --server.address 0.0.0.0
π Technical Implementation Details
Embedding Strategy
- Model:
sentence-transformers/all-MiniLM-L6-v2 - Dimensions: 384
- Normalization: L2 normalization for cosine similarity
- Chunking: 500 tokens with 50 token overlap
Vector Database
- Database: ChromaDB
- Similarity: Cosine distance
- Persistence: Local file storage (.chroma directory)
- Indexing: HNSW algorithm for fast retrieval
Anomaly Detection
- Method: Rolling z-score analysis
- Window Size: 50 data points
- Threshold: Z-score > 3.0
- Metrics: Temperature, humidity, power consumption
Predictive Maintenance
- Algorithm: Rule-based heuristics + statistical analysis
- Input: Sensor data + anomaly patterns
- Output: Maintenance recommendations + efficiency tips
- Real-time: Continuous monitoring and updates
π§ͺ Testing
Local Testing
# Test RAG modules
python -c "from rag.ingest import ensure_vector_store; print('β
RAG Ready')"
# Test predictive models
python -c "from models.predictive import detect_anomalies; print('β
Models Ready')"
# Test full application
streamlit run app.py
Sample Data
The system includes sample data for testing:
- HVAC Sensor Data: Temperature, humidity, power readings
- Chiller Manual: Maintenance procedures and fault codes
- Building Specs: System specifications and requirements
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Academic Use
This project was developed as part of an academic RAG system implementation course. It demonstrates:
- RAG Architecture: Complete retrieval-augmented generation system
- IoT Integration: Real-time sensor data processing
- Predictive Analytics: Machine learning for maintenance
- Vector Databases: ChromaDB implementation
- Modern Web UI: Streamlit-based dashboard
π Support
For questions or issues:
- GitHub Issues: Create an issue
- Documentation: Check this README and code comments
- Community: Streamlit and HuggingFace communities
π Future Enhancements
- Real-time IoT device integration
- Advanced ML models for failure prediction
- Multi-modal document support (images, audio)
- API endpoints for external systems
- Mobile-responsive interface
- Advanced analytics dashboard
- Integration with building management systems
Built with β€οΈ for Smart Building Intelligence
Last updated: January 2025