File size: 9,682 Bytes
51a50d5
 
 
 
 
 
 
 
 
 
 
 
 
f7f56cb
 
5ddccf5
f7f56cb
5ddccf5
f7f56cb
5ddccf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7f56cb
 
5ddccf5
 
 
 
 
f7f56cb
5ddccf5
 
 
 
f7f56cb
 
 
5ddccf5
 
 
 
 
f7f56cb
 
5ddccf5
f7f56cb
 
 
 
 
5ddccf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7f56cb
5ddccf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7f56cb
5ddccf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7f56cb
 
5ddccf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7f56cb
5ddccf5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
---
title: IoT Sensor Data RAG for Smart Buildings
emoji: 🏒
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: "1.42.1"
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# IoT Sensor Data RAG for Smart Buildings

## 🏒 Problem Statement

Create a RAG system that processes IoT sensor data, maintenance manuals, and building specifications to provide predictive maintenance insights and operational optimization.

## 🎯 Key Requirements

- βœ… **IoT sensor data ingestion and real-time processing**
- βœ… **Maintenance manual and building specification integration**
- βœ… **Predictive maintenance algorithm implementation**
- βœ… **Operational efficiency optimization recommendations**
- βœ… **Anomaly detection and alert systems**

## πŸš€ Technical Challenges Solved

- βœ… **Real-time sensor data streaming and processing**
- βœ… **Multi-sensor data fusion and correlation**
- βœ… **Predictive modeling for equipment failure**
- βœ… **Building system integration and compatibility**
- βœ… **Energy efficiency optimization algorithms**

## πŸ—οΈ System Architecture

### Core Components
- **RAG Engine**: Vector database (ChromaDB) with Sentence-Transformers embeddings
- **IoT Data Processor**: Real-time sensor data streaming and anomaly detection
- **Predictive Analytics**: Equipment failure prediction and maintenance recommendations
- **Document Intelligence**: PDF/TXT processing with smart chunking strategies
- **Web Interface**: Modern Streamlit dashboard with Material design theme

### Technology Stack
- **Backend**: Python, Streamlit, ChromaDB
- **Embeddings**: Sentence-Transformers (all-MiniLM-L6-v2)
- **Vector Database**: ChromaDB with cosine similarity
- **LLM Integration**: Local Transformers + OpenAI API (optional)
- **Data Processing**: Pandas, NumPy, Scikit-learn
- **Visualization**: Plotly for real-time sensor monitoring

## πŸ“Š Features

### 1. Real-Time IoT Monitoring
- Live sensor data streaming simulation
- Multi-sensor data fusion (temperature, humidity, power consumption)
- Real-time anomaly detection using rolling z-score analysis
- Interactive time-series visualizations

### 2. Intelligent Document RAG
- PDF and TXT document ingestion
- Smart text chunking (500 tokens with 50 token overlap)
- Context-aware retrieval using vector similarity
- Source attribution and relevance scoring

### 3. Predictive Maintenance
- Equipment failure prediction algorithms
- Maintenance schedule optimization
- Energy efficiency recommendations
- Anomaly-based alert systems

### 4. Evaluation & Analytics
- Retrieval accuracy metrics
- Response latency measurement
- Document relevance scoring
- System performance monitoring

## πŸš€ Quick Start

### Prerequisites
- Python 3.8+
- 8GB+ RAM (for local LLM models)
- Internet connection (for initial model downloads)

### Installation

```bash
# Clone the repository
git clone https://github.com/itsnewcoder/iot-smart-building-rag.git
cd iot-smart-building-rag

# Create virtual environment
python -m venv .venv
.venv\Scripts\activate  # Windows
# source .venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -r requirements.txt
```

### Configuration

Create a `.env` file in the root directory (optional):
```env
OPENAI_API_KEY=your_openai_api_key_here
```

### Run Locally

```bash
streamlit run app.py
```

**Access your app at:** `http://localhost:8501`

## πŸ“ Project Structure

```
iot-smart-building-rag/
β”œβ”€β”€ app.py                      # Main Streamlit application
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ .streamlit/
β”‚   └── config.toml            # Streamlit theme configuration
β”œβ”€β”€ rag/                       # RAG system core
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ ingest.py              # Document ingestion & vector store
β”‚   β”œβ”€β”€ retrieval.py           # Context retrieval engine
β”‚   β”œβ”€β”€ generate.py            # LLM response generation
β”‚   └── evaluate.py            # System evaluation metrics
β”œβ”€β”€ models/                     # Predictive models
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── predictive.py          # Anomaly detection & maintenance
β”œβ”€β”€ data/                       # Sample data
β”‚   β”œβ”€β”€ manuals/               # Maintenance manuals (PDF/TXT)
β”‚   β”œβ”€β”€ specs/                 # Building specifications
β”‚   └── sensors/               # IoT sensor data (CSV)
└── .chroma/                   # Vector database storage
```

## πŸ”§ Usage Guide

### 1. Dashboard Tab
- **Start Stream**: Begin real-time sensor data simulation
- **Live Monitoring**: View real-time sensor readings and trends
- **Anomaly Detection**: See detected anomalies with z-score analysis
- **Maintenance Tips**: Get AI-powered maintenance recommendations

### 2. RAG QA Tab
- **Ask Questions**: Query maintenance procedures and building specs
- **Context Retrieval**: View relevant document chunks and sources
- **AI Responses**: Get context-aware answers from local or OpenAI models

### 3. Evaluation Tab
- **Retrieval Testing**: Test system with custom queries
- **Performance Metrics**: View latency and relevance scores
- **Quality Assessment**: Evaluate RAG system effectiveness

### 4. Data Manager Tab
- **Document Index**: View indexed documents and sources
- **File Upload**: Add new PDFs/TXTs to the knowledge base
- **Vector Store**: Manage document embeddings and storage

## πŸ“ˆ Sample Queries

Try these example questions in the RAG QA tab:

- "How to reset chiller pump?"
- "What are the fault codes for HVAC systems?"
- "How to maintain building temperature sensors?"
- "What are the power consumption optimization tips?"
- "How to troubleshoot humidity sensor issues?"

## 🎯 Evaluation Metrics

### Retrieval Quality
- **Relevance Scoring**: Cosine similarity-based ranking
- **Source Attribution**: Document source tracking
- **Context Retrieval**: Top-k document retrieval

### Performance Metrics
- **Response Latency**: End-to-end query processing time
- **Throughput**: Queries processed per second
- **Memory Usage**: Vector database storage efficiency

### RAG Effectiveness
- **Context Relevance**: Retrieved document quality
- **Answer Accuracy**: Response relevance to queries
- **Source Diversity**: Multiple document source utilization

## 🌐 Deployment

### HuggingFace Spaces (Recommended)
1. Create new Space at [huggingface.co/spaces](https://huggingface.co/spaces)
2. Choose **Streamlit** as SDK
3. Upload project files
4. Set environment variables in Space settings

### Streamlit Cloud
1. Push code to GitHub
2. Connect repository at [share.streamlit.io](https://share.streamlit.io)
3. Deploy automatically

### Local Deployment
```bash
# Production server
streamlit run app.py --server.port 8501 --server.address 0.0.0.0
```

## πŸ” Technical Implementation Details

### Embedding Strategy
- **Model**: `sentence-transformers/all-MiniLM-L6-v2`
- **Dimensions**: 384
- **Normalization**: L2 normalization for cosine similarity
- **Chunking**: 500 tokens with 50 token overlap

### Vector Database
- **Database**: ChromaDB
- **Similarity**: Cosine distance
- **Persistence**: Local file storage (.chroma directory)
- **Indexing**: HNSW algorithm for fast retrieval

### Anomaly Detection
- **Method**: Rolling z-score analysis
- **Window Size**: 50 data points
- **Threshold**: Z-score > 3.0
- **Metrics**: Temperature, humidity, power consumption

### Predictive Maintenance
- **Algorithm**: Rule-based heuristics + statistical analysis
- **Input**: Sensor data + anomaly patterns
- **Output**: Maintenance recommendations + efficiency tips
- **Real-time**: Continuous monitoring and updates

## πŸ§ͺ Testing

### Local Testing
```bash
# Test RAG modules
python -c "from rag.ingest import ensure_vector_store; print('βœ… RAG Ready')"

# Test predictive models
python -c "from models.predictive import detect_anomalies; print('βœ… Models Ready')"

# Test full application
streamlit run app.py
```

### Sample Data
The system includes sample data for testing:
- **HVAC Sensor Data**: Temperature, humidity, power readings
- **Chiller Manual**: Maintenance procedures and fault codes
- **Building Specs**: System specifications and requirements

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## πŸ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## πŸŽ“ Academic Use

This project was developed as part of an academic RAG system implementation course. It demonstrates:

- **RAG Architecture**: Complete retrieval-augmented generation system
- **IoT Integration**: Real-time sensor data processing
- **Predictive Analytics**: Machine learning for maintenance
- **Vector Databases**: ChromaDB implementation
- **Modern Web UI**: Streamlit-based dashboard

## πŸ“ž Support

For questions or issues:
- **GitHub Issues**: [Create an issue](https://github.com/itsnewcoder/iot-smart-building-rag/issues)
- **Documentation**: Check this README and code comments
- **Community**: Streamlit and HuggingFace communities

## πŸš€ Future Enhancements

- [ ] Real-time IoT device integration
- [ ] Advanced ML models for failure prediction
- [ ] Multi-modal document support (images, audio)
- [ ] API endpoints for external systems
- [ ] Mobile-responsive interface
- [ ] Advanced analytics dashboard
- [ ] Integration with building management systems

---

**Built with ❀️ for Smart Building Intelligence**

*Last updated: January 2025*