namish10
/

contextflow-rl

@@ -1,35 +1,94 @@
 # ContextFlow RL Doubt Predictor
-## Overview
-This is the trained reinforcement learning model for ContextFlow doubt prediction system.
 ## Model Details
-- **Algorithm**: GRPO (Group Relative Policy Optimization) + Q-Learning
-- **State Dimension**: 64 features
-- **Action Dimension**: 10 doubt prediction actions
-- **Policy Version**: 50
-- **Training Samples**: 200
 ## Usage
 ```python
 import pickle
 from huggingface_hub import hf_hub_download
-# Download checkpoint
 path = hf_hub_download(repo_id='namish10/contextflow-rl', filename='checkpoint.pkl')
-# Load checkpoint
 with open(path, 'rb') as f:
     checkpoint = pickle.load(f)
-print(f"Policy version: {checkpoint.policy_version}")
 ```
 ## Citation
 ```bibtex
 @software{contextflow_rl,
   title={ContextFlow RL Doubt Predictor},
   author={ContextFlow Team},
-  year={2026}
 }
 ```

+---
+license: apache-2.0
+tags:
+  - reinforcement-learning
+  - education
+  - doubt-prediction
+  - q-learning
+---
 # ContextFlow RL Doubt Predictor
+A reinforcement learning model that predicts when learners will get confused **before** it happens, using hand gesture recognition and privacy-first face blurring.
 ## Model Details
+| Property | Value |
+|----------|-------|
+| **Algorithm** | GRPO + Q-Learning |
+| **State Dimension** | 64 features |
+| **Action Dimension** | 10 doubt predictions |
+| **Policy Version** | 50 |
+| **Training Samples** | 200 |
+| **Framework** | PyTorch |
+## Architecture
+```
+Q-Network: 64 → 128 → 128 → 10
+├── State Encoder (64 features)
+├── Hidden Layer 1 (128 units, ReLU)
+├── Hidden Layer 2 (128 units, ReLU)
+└── Output Layer (10 actions)
+```
+## Features (64-dimensional state vector)
+The state vector encodes:
+1. **Topic Embedding** (32 dims) - TF-IDF representation of learning topic
+2. **Progress** (1 dim) - Session progress percentage
+3. **Confusion Signals** (16 dims) - Behavioral indicators:
+   - Mouse hesitation patterns
+   - Scroll reversals
+   - Time on page
+   - Eye tracking (if available)
+4. **Gesture Signals** (14 dims) - Hand gesture frequencies
+5. **Time Spent** (1 dim) - Total session time
+## Reward Function
+The model optimizes for:
+- **Correct doubt prediction**: +1.0
+- **Helpful explanation provided**: +0.5
+- **User engagement maintained**: +0.3
+- **False positive**: -0.5
+- **Missed confusion**: -1.0
 ## Usage
 ```python
 import pickle
+import numpy as np
 from huggingface_hub import hf_hub_download
+# Load model
 path = hf_hub_download(repo_id='namish10/contextflow-rl', filename='checkpoint.pkl')
 with open(path, 'rb') as f:
     checkpoint = pickle.load(f)
+# Extract Q-network
+q_weights = checkpoint.q_network_weights
+# Create state vector (64 features)
+state = np.random.randn(64)
+# Predict doubt actions
+# (Requires instantiating QNetwork class from train_rl.py)
 ```
 ## Citation
 ```bibtex
 @software{contextflow_rl,
   title={ContextFlow RL Doubt Predictor},
   author={ContextFlow Team},
+  year={2026},
+  url={https://github.com/contextflow}
 }
 ```
+## Limitations
+- Trained on 200 synthetic samples (limited real-world data)
+- Hand gesture recognition requires MediaPipe
+- Privacy-first: face auto-blurred during gesture capture