imnikhilraj's picture
Add HuggingFace Space configuration
51a50d5
---
title: IoT Sensor Data RAG for Smart Buildings
emoji: 🏒
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: "1.42.1"
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# IoT Sensor Data RAG for Smart Buildings
## 🏒 Problem Statement
Create a RAG system that processes IoT sensor data, maintenance manuals, and building specifications to provide predictive maintenance insights and operational optimization.
## 🎯 Key Requirements
- βœ… **IoT sensor data ingestion and real-time processing**
- βœ… **Maintenance manual and building specification integration**
- βœ… **Predictive maintenance algorithm implementation**
- βœ… **Operational efficiency optimization recommendations**
- βœ… **Anomaly detection and alert systems**
## πŸš€ Technical Challenges Solved
- βœ… **Real-time sensor data streaming and processing**
- βœ… **Multi-sensor data fusion and correlation**
- βœ… **Predictive modeling for equipment failure**
- βœ… **Building system integration and compatibility**
- βœ… **Energy efficiency optimization algorithms**
## πŸ—οΈ System Architecture
### Core Components
- **RAG Engine**: Vector database (ChromaDB) with Sentence-Transformers embeddings
- **IoT Data Processor**: Real-time sensor data streaming and anomaly detection
- **Predictive Analytics**: Equipment failure prediction and maintenance recommendations
- **Document Intelligence**: PDF/TXT processing with smart chunking strategies
- **Web Interface**: Modern Streamlit dashboard with Material design theme
### Technology Stack
- **Backend**: Python, Streamlit, ChromaDB
- **Embeddings**: Sentence-Transformers (all-MiniLM-L6-v2)
- **Vector Database**: ChromaDB with cosine similarity
- **LLM Integration**: Local Transformers + OpenAI API (optional)
- **Data Processing**: Pandas, NumPy, Scikit-learn
- **Visualization**: Plotly for real-time sensor monitoring
## πŸ“Š Features
### 1. Real-Time IoT Monitoring
- Live sensor data streaming simulation
- Multi-sensor data fusion (temperature, humidity, power consumption)
- Real-time anomaly detection using rolling z-score analysis
- Interactive time-series visualizations
### 2. Intelligent Document RAG
- PDF and TXT document ingestion
- Smart text chunking (500 tokens with 50 token overlap)
- Context-aware retrieval using vector similarity
- Source attribution and relevance scoring
### 3. Predictive Maintenance
- Equipment failure prediction algorithms
- Maintenance schedule optimization
- Energy efficiency recommendations
- Anomaly-based alert systems
### 4. Evaluation & Analytics
- Retrieval accuracy metrics
- Response latency measurement
- Document relevance scoring
- System performance monitoring
## πŸš€ Quick Start
### Prerequisites
- Python 3.8+
- 8GB+ RAM (for local LLM models)
- Internet connection (for initial model downloads)
### Installation
```bash
# Clone the repository
git clone https://github.com/itsnewcoder/iot-smart-building-rag.git
cd iot-smart-building-rag
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt
```
### Configuration
Create a `.env` file in the root directory (optional):
```env
OPENAI_API_KEY=your_openai_api_key_here
```
### Run Locally
```bash
streamlit run app.py
```
**Access your app at:** `http://localhost:8501`
## πŸ“ Project Structure
```
iot-smart-building-rag/
β”œβ”€β”€ app.py # Main Streamlit application
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ .streamlit/
β”‚ └── config.toml # Streamlit theme configuration
β”œβ”€β”€ rag/ # RAG system core
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ ingest.py # Document ingestion & vector store
β”‚ β”œβ”€β”€ retrieval.py # Context retrieval engine
β”‚ β”œβ”€β”€ generate.py # LLM response generation
β”‚ └── evaluate.py # System evaluation metrics
β”œβ”€β”€ models/ # Predictive models
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── predictive.py # Anomaly detection & maintenance
β”œβ”€β”€ data/ # Sample data
β”‚ β”œβ”€β”€ manuals/ # Maintenance manuals (PDF/TXT)
β”‚ β”œβ”€β”€ specs/ # Building specifications
β”‚ └── sensors/ # IoT sensor data (CSV)
└── .chroma/ # Vector database storage
```
## πŸ”§ Usage Guide
### 1. Dashboard Tab
- **Start Stream**: Begin real-time sensor data simulation
- **Live Monitoring**: View real-time sensor readings and trends
- **Anomaly Detection**: See detected anomalies with z-score analysis
- **Maintenance Tips**: Get AI-powered maintenance recommendations
### 2. RAG QA Tab
- **Ask Questions**: Query maintenance procedures and building specs
- **Context Retrieval**: View relevant document chunks and sources
- **AI Responses**: Get context-aware answers from local or OpenAI models
### 3. Evaluation Tab
- **Retrieval Testing**: Test system with custom queries
- **Performance Metrics**: View latency and relevance scores
- **Quality Assessment**: Evaluate RAG system effectiveness
### 4. Data Manager Tab
- **Document Index**: View indexed documents and sources
- **File Upload**: Add new PDFs/TXTs to the knowledge base
- **Vector Store**: Manage document embeddings and storage
## πŸ“ˆ Sample Queries
Try these example questions in the RAG QA tab:
- "How to reset chiller pump?"
- "What are the fault codes for HVAC systems?"
- "How to maintain building temperature sensors?"
- "What are the power consumption optimization tips?"
- "How to troubleshoot humidity sensor issues?"
## 🎯 Evaluation Metrics
### Retrieval Quality
- **Relevance Scoring**: Cosine similarity-based ranking
- **Source Attribution**: Document source tracking
- **Context Retrieval**: Top-k document retrieval
### Performance Metrics
- **Response Latency**: End-to-end query processing time
- **Throughput**: Queries processed per second
- **Memory Usage**: Vector database storage efficiency
### RAG Effectiveness
- **Context Relevance**: Retrieved document quality
- **Answer Accuracy**: Response relevance to queries
- **Source Diversity**: Multiple document source utilization
## 🌐 Deployment
### HuggingFace Spaces (Recommended)
1. Create new Space at [huggingface.co/spaces](https://huggingface.co/spaces)
2. Choose **Streamlit** as SDK
3. Upload project files
4. Set environment variables in Space settings
### Streamlit Cloud
1. Push code to GitHub
2. Connect repository at [share.streamlit.io](https://share.streamlit.io)
3. Deploy automatically
### Local Deployment
```bash
# Production server
streamlit run app.py --server.port 8501 --server.address 0.0.0.0
```
## πŸ” Technical Implementation Details
### Embedding Strategy
- **Model**: `sentence-transformers/all-MiniLM-L6-v2`
- **Dimensions**: 384
- **Normalization**: L2 normalization for cosine similarity
- **Chunking**: 500 tokens with 50 token overlap
### Vector Database
- **Database**: ChromaDB
- **Similarity**: Cosine distance
- **Persistence**: Local file storage (.chroma directory)
- **Indexing**: HNSW algorithm for fast retrieval
### Anomaly Detection
- **Method**: Rolling z-score analysis
- **Window Size**: 50 data points
- **Threshold**: Z-score > 3.0
- **Metrics**: Temperature, humidity, power consumption
### Predictive Maintenance
- **Algorithm**: Rule-based heuristics + statistical analysis
- **Input**: Sensor data + anomaly patterns
- **Output**: Maintenance recommendations + efficiency tips
- **Real-time**: Continuous monitoring and updates
## πŸ§ͺ Testing
### Local Testing
```bash
# Test RAG modules
python -c "from rag.ingest import ensure_vector_store; print('βœ… RAG Ready')"
# Test predictive models
python -c "from models.predictive import detect_anomalies; print('βœ… Models Ready')"
# Test full application
streamlit run app.py
```
### Sample Data
The system includes sample data for testing:
- **HVAC Sensor Data**: Temperature, humidity, power readings
- **Chiller Manual**: Maintenance procedures and fault codes
- **Building Specs**: System specifications and requirements
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## πŸ“„ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## πŸŽ“ Academic Use
This project was developed as part of an academic RAG system implementation course. It demonstrates:
- **RAG Architecture**: Complete retrieval-augmented generation system
- **IoT Integration**: Real-time sensor data processing
- **Predictive Analytics**: Machine learning for maintenance
- **Vector Databases**: ChromaDB implementation
- **Modern Web UI**: Streamlit-based dashboard
## πŸ“ž Support
For questions or issues:
- **GitHub Issues**: [Create an issue](https://github.com/itsnewcoder/iot-smart-building-rag/issues)
- **Documentation**: Check this README and code comments
- **Community**: Streamlit and HuggingFace communities
## πŸš€ Future Enhancements
- [ ] Real-time IoT device integration
- [ ] Advanced ML models for failure prediction
- [ ] Multi-modal document support (images, audio)
- [ ] API endpoints for external systems
- [ ] Mobile-responsive interface
- [ ] Advanced analytics dashboard
- [ ] Integration with building management systems
---
**Built with ❀️ for Smart Building Intelligence**
*Last updated: January 2025*