Praanshull commited on
Commit
a537907
·
verified ·
1 Parent(s): b554163

Upload 5 files

Browse files
Files changed (5) hide show
  1. QUICKSTART.md +279 -0
  2. README.md +354 -11
  3. app.py +92 -0
  4. requirements.txt +25 -0
  5. test_model.py +134 -0
QUICKSTART.md ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚀 Quick Start Guide
2
+
3
+ Get the Multilingual QA System up and running in **5 minutes**!
4
+
5
+ ---
6
+
7
+ ## ⚡ Fast Track
8
+
9
+ ```bash
10
+ # 1. Clone and enter directory
11
+ git clone https://github.com/Praanshull/multilingual-qa-system.git
12
+ cd multilingual-qa-system
13
+
14
+ # 2. Install dependencies
15
+ pip install -r requirements.txt
16
+
17
+ # 3. Run setup script (first time only)
18
+ python setup_project.py
19
+
20
+ # 4. Launch application
21
+ python app.py
22
+ ```
23
+
24
+ Then open **http://localhost:7860** in your browser!
25
+
26
+ ---
27
+
28
+ ## 📋 Detailed Steps
29
+
30
+ ### Step 1: Prerequisites
31
+
32
+ Make sure you have:
33
+ - ✅ Python 3.8 or higher
34
+ - ✅ pip (Python package manager)
35
+ - ✅ Git
36
+ - ✅ (Optional) CUDA-capable GPU
37
+
38
+ Check your Python version:
39
+ ```bash
40
+ python --version
41
+ ```
42
+
43
+ ### Step 2: Clone Repository
44
+
45
+ ```bash
46
+ git clone https://github.com/Praanshull/multilingual-qa-system.git
47
+ cd multilingual-qa-system
48
+ ```
49
+
50
+ ### Step 3: Create Virtual Environment (Recommended)
51
+
52
+ **Windows:**
53
+ ```bash
54
+ python -m venv venv
55
+ venv\Scripts\activate
56
+ ```
57
+
58
+ **Mac/Linux:**
59
+ ```bash
60
+ python -m venv venv
61
+ source venv/bin/activate
62
+ ```
63
+
64
+ ### Step 4: Install Dependencies
65
+
66
+ ```bash
67
+ pip install -r requirements.txt
68
+ ```
69
+
70
+ This will install:
71
+ - PyTorch
72
+ - Transformers
73
+ - Gradio
74
+ - PEFT
75
+ - And other required packages
76
+
77
+ **Estimated time:** 2-5 minutes
78
+
79
+ ### Step 5: Setup Project Structure
80
+
81
+ ```bash
82
+ python setup_project.py
83
+ ```
84
+
85
+ This script will:
86
+ 1. Create necessary directories
87
+ 2. Move model files to correct locations
88
+ 3. Create configuration files
89
+ 4. Verify everything is set up correctly
90
+
91
+ **Note:** If you haven't downloaded the model yet, you'll need to:
92
+ - Download from Google Drive (if shared)
93
+ - Or the model will be downloaded automatically on first run
94
+
95
+ ### Step 6: Test the Model (Optional)
96
+
97
+ ```bash
98
+ python test_model.py
99
+ ```
100
+
101
+ This runs quick tests to verify everything works.
102
+
103
+ ### Step 7: Launch the Application
104
+
105
+ ```bash
106
+ python app.py
107
+ ```
108
+
109
+ You should see:
110
+ ```
111
+ ================================================================================
112
+ 🚀 LAUNCHING APPLICATION
113
+ ================================================================================
114
+ ✅ Application launched successfully!
115
+ 📱 Access the interface at: http://localhost:7860
116
+ ```
117
+
118
+ ### Step 8: Open in Browser
119
+
120
+ Open your web browser and go to:
121
+ ```
122
+ http://localhost:7860
123
+ ```
124
+
125
+ ---
126
+
127
+ ## 🎯 Using the Interface
128
+
129
+ ### Ask Questions Tab
130
+
131
+ 1. **Select Language:** Choose English 🇬🇧 or German 🇩🇪
132
+ 2. **Enter Question:** Type your question
133
+ 3. **Provide Context:** Paste the passage containing the answer
134
+ 4. **Click "Get Answer":** The model will extract the answer
135
+
136
+ **Tips:**
137
+ - Keep context under 300 words for best results
138
+ - Make sure the answer is explicitly stated in the context
139
+ - Use clear, direct questions
140
+
141
+ ### Try Examples
142
+
143
+ 1. Click on "Try Examples" section
144
+ 2. Select example type (General Knowledge, Historical, Scientific)
145
+ 3. Click "Load Example"
146
+ 4. The question and context will be filled automatically
147
+ 5. Click "Get Answer"
148
+
149
+ ---
150
+
151
+ ## 🔧 Troubleshooting
152
+
153
+ ### Model Not Found Error
154
+
155
+ **Problem:** `❌ Failed to load model: Model not found`
156
+
157
+ **Solution:**
158
+ ```bash
159
+ # Update the model path in app.py
160
+ MODEL_PATH = "models/multilingual_model"
161
+
162
+ # Or download the model:
163
+ python download_model.py
164
+ ```
165
+
166
+ ### CUDA Out of Memory
167
+
168
+ **Problem:** `RuntimeError: CUDA out of memory`
169
+
170
+ **Solution:**
171
+ ```python
172
+ # The model will automatically fall back to CPU
173
+ # Or reduce batch size in config if running inference in batches
174
+ ```
175
+
176
+ ### Port Already in Use
177
+
178
+ **Problem:** `OSError: [Errno 48] Address already in use`
179
+
180
+ **Solution:**
181
+ ```bash
182
+ # Use a different port
183
+ python app.py --port 7861
184
+ ```
185
+
186
+ Or kill the process using port 7860:
187
+ ```bash
188
+ # Mac/Linux
189
+ lsof -ti:7860 | xargs kill -9
190
+
191
+ # Windows
192
+ netstat -ano | findstr :7860
193
+ taskkill /PID <PID> /F
194
+ ```
195
+
196
+ ### Import Errors
197
+
198
+ **Problem:** `ModuleNotFoundError: No module named 'xxx'`
199
+
200
+ **Solution:**
201
+ ```bash
202
+ # Reinstall dependencies
203
+ pip install -r requirements.txt --force-reinstall
204
+ ```
205
+
206
+ ---
207
+
208
+ ## 🌐 Deploy to Cloud
209
+
210
+ ### Deploy to Hugging Face Spaces (Free)
211
+
212
+ ```bash
213
+ # Install Gradio
214
+ pip install gradio
215
+
216
+ # Deploy (from project directory)
217
+ gradio deploy
218
+ ```
219
+
220
+ ### Deploy to Railway/Render
221
+
222
+ 1. Create account on Railway/Render
223
+ 2. Connect your GitHub repository
224
+ 3. Set start command: `python app.py`
225
+ 4. Deploy!
226
+
227
+ ---
228
+
229
+ ## 📚 Next Steps
230
+
231
+ Now that you have the app running:
232
+
233
+ 1. ✅ Read the full [README.md](README.md) for detailed documentation
234
+ 2. ✅ Check out the [notebook/main.ipynb](notebook/main.ipynb) to see training process
235
+ 3. ✅ Explore the code in `app/` directory
236
+ 4. ✅ Try modifying examples in `app/utils.py`
237
+ 5. ✅ Add your own test cases in `test_model.py`
238
+
239
+ ---
240
+
241
+ ## 💡 Pro Tips
242
+
243
+ ### For Development
244
+
245
+ ```bash
246
+ # Enable debug mode
247
+ python app.py --debug
248
+
249
+ # Share publicly (generates public URL)
250
+ python app.py --share
251
+
252
+ # Run on specific port
253
+ python app.py --port 8080
254
+ ```
255
+
256
+ ### For Production
257
+
258
+ ```bash
259
+ # Use gunicorn for better performance
260
+ gunicorn app:app --workers 4 --bind 0.0.0.0:7860
261
+ ```
262
+
263
+ ---
264
+
265
+ ## ❓ Need Help?
266
+
267
+ - 📖 Check [README.md](README.md) for detailed docs
268
+ - 🐛 Report issues on [GitHub Issues](https://github.com/Praanshull/multilingual-qa-system/issues)
269
+ - 💬 Ask questions in Discussions
270
+
271
+ ---
272
+
273
+ <div align="center">
274
+
275
+ **Happy Question Answering! 🎉**
276
+
277
+ [⬆️ Back to Top](#-quick-start-guide)
278
+
279
+ </div>
README.md CHANGED
@@ -1,14 +1,357 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Multilingual Qa System
3
- emoji: 🏢
4
- colorFrom: purple
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 6.0.2
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: 'A state-of-the-art multilingual question answering system '
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🌍 Multilingual Question Answering System
2
+
3
+ A state-of-the-art multilingual question answering system supporting **English 🇬🇧** and **German 🇩🇪**, built with **mBART-large-50** fine-tuned using **LoRA** (Low-Rank Adaptation).
4
+
5
+ ![Model](https://img.shields.io/badge/Model-mBART--large--50-blue)
6
+ ![Framework](https://img.shields.io/badge/Framework-PyTorch-orange)
7
+ ![License](https://img.shields.io/badge/License-MIT-green)
8
+
9
+ ---
10
+
11
+ ## 📋 Table of Contents
12
+
13
+ - [Overview](#overview)
14
+ - [Key Features](#key-features)
15
+ - [Performance](#performance)
16
+ - [Installation](#installation)
17
+ - [Project Structure](#project-structure)
18
+ - [Usage](#usage)
19
+ - [Model Details](#model-details)
20
+ - [Training](#training)
21
+ - [Limitations](#limitations)
22
+ - [Future Improvements](#future-improvements)
23
+ - [Citation](#citation)
24
+ - [License](#license)
25
+
26
+ ---
27
+
28
+ ## 🎯 Overview
29
+
30
+ This project implements a **bilingual extractive question answering system** that can:
31
+ - Extract answers from English contexts
32
+ - Extract answers from German contexts
33
+ - Achieve **high accuracy** with minimal training data through transfer learning
34
+ - Run efficiently using **Parameter-Efficient Fine-Tuning (LoRA)**
35
+
36
+ ### What is Extractive QA?
37
+ The model reads a passage (context) and a question, then extracts the exact answer span from the context.
38
+
39
+ **Example:**
40
+ - **Question:** "What is the capital of France?"
41
+ - **Context:** "Paris is the capital and most populous city of France."
42
+ - **Answer:** "Paris"
43
+
44
+ ---
45
+
46
+ ## ✨ Key Features
47
+
48
+ ✅ **Bilingual Support** - English and German
49
+ ✅ **Fast Inference** - <1 second per query on GPU
50
+ ✅ **Memory Efficient** - Uses LoRA (only 0.29% trainable parameters)
51
+ ✅ **High Accuracy** - >65% F1 score on both languages
52
+ ✅ **Easy Deployment** - Gradio web interface included
53
+ ✅ **Well Documented** - Comprehensive code comments and README
54
+
55
+ ---
56
+
57
+ ## 📊 Performance
58
+
59
+ ### Model Metrics
60
+
61
+ | Metric | English (SQuAD) | German (XQuAD) | Improvement |
62
+ |--------|----------------|----------------|-------------|
63
+ | **BLEU** | 37.79 | **43.12** | +5.33 |
64
+ | **ROUGE-L** | 0.6272 | **0.6622** | +0.035 |
65
+ | **Exact Match** | 43.60% | **48.74%** | +5.14% |
66
+ | **F1 Score** | 0.6329 | **0.6580** | +0.025 |
67
+ | **Avg (EM+F1)** | 0.5344 | **0.5727** | +0.038 |
68
+
69
+ ### Key Insights
70
+ - 🎉 **German achieves 107.2% of English performance** despite having only ~5% of training data
71
+ - 🚀 Strong **transfer learning** from English to German
72
+ - 💪 Better German scores demonstrate effective **cross-lingual adaptation**
73
+
74
+ ---
75
+
76
+ ## 🚀 Installation
77
+
78
+ ### Prerequisites
79
+ - Python 3.8+
80
+ - CUDA-capable GPU (recommended, 8GB+ VRAM)
81
+ - 16GB+ RAM
82
+
83
+ ### Setup
84
+
85
+ 1. **Clone the repository**
86
+ ```bash
87
+ git clone https://github.com/Praanshull/multilingual-qa-system.git
88
+ cd multilingual-qa-system
89
+ ```
90
+
91
+ 2. **Create virtual environment**
92
+ ```bash
93
+ python -m venv venv
94
+ source venv/bin/activate # On Windows: venv\Scripts\activate
95
+ ```
96
+
97
+ 3. **Install dependencies**
98
+ ```bash
99
+ pip install -r requirements.txt
100
+ ```
101
+
102
+ 4. **Download the model**
103
+ ```bash
104
+ # Option 1: Download from your Google Drive
105
+ # (Replace with your actual model path)
106
+
107
+ # Option 2: Use Hugging Face (if uploaded)
108
+ # Will be automatically downloaded on first run
109
+ ```
110
+
111
+ ---
112
+
113
+ ## 📁 Project Structure
114
+
115
+ ```
116
+ Multilingual-QA-System/
117
+ ├── app/
118
+ │ ├── __init__.py # Package initialization
119
+ │ ├── model_loader.py # Model loading logic
120
+ │ ├── inference.py # Inference/prediction engine
121
+ │ ├── interface.py # Gradio UI components
122
+ │ └── utils.py # Utility functions
123
+
124
+ ├── models/
125
+ │ └── multilingual_model/ # Saved model files
126
+ │ ├── adapter_config.json
127
+ │ ├── adapter_model.bin
128
+ │ ├── tokenizer_config.json
129
+ │ └── ...
130
+
131
+ ├── checkpoints/ # Training checkpoints
132
+ │ ├── checkpoint-500/
133
+ │ ├── checkpoint-1000/
134
+ │ └── ...
135
+
136
+ ├── logs/ # Training logs
137
+ │ └── training.log
138
+
139
+ ├── notebook/ # Original Jupyter notebook
140
+ │ └── main.ipynb
141
+
142
+ ├── app.py # Main application entry point
143
+ ├── requirements.txt # Python dependencies
144
+ ├── README.md # This file
145
+ ├── .gitignore # Git ignore rules
146
+ └── LICENSE # MIT License
147
+
148
+ ```
149
+
150
+ ---
151
+
152
+ ## 💻 Usage
153
+
154
+ ### 1. Launch the Web Interface
155
+
156
+ ```bash
157
+ python app.py
158
+ ```
159
+
160
+ Then open your browser to **http://localhost:7860**
161
+
162
+ ### 2. Programmatic Usage
163
+
164
+ ```python
165
+ from app.model_loader import ModelLoader
166
+ from app.inference import QAInference
167
+
168
+ # Load model
169
+ loader = ModelLoader(model_path="models/multilingual_model")
170
+ model, tokenizer = loader.load()
171
+
172
+ # Create inference engine
173
+ qa = QAInference(model, tokenizer, loader.device)
174
+
175
+ # English example
176
+ answer, info = qa.answer_question(
177
+ question="What is the capital of France?",
178
+ context="Paris is the capital and most populous city of France.",
179
+ language="English"
180
+ )
181
+ print(f"Answer: {answer}")
182
+
183
+ # German example
184
+ answer_de, info_de = qa.answer_question(
185
+ question="Was ist die Hauptstadt von Deutschland?",
186
+ context="Berlin ist die Hauptstadt von Deutschland.",
187
+ language="German"
188
+ )
189
+ print(f"Antwort: {answer_de}")
190
+ ```
191
+
192
+ ### 3. API Server (Coming Soon)
193
+
194
+ ```bash
195
+ # Launch FastAPI server
196
+ python -m app.api --host 0.0.0.0 --port 8000
197
+ ```
198
+
199
+ ---
200
+
201
+ ## 🧠 Model Details
202
+
203
+ ### Architecture
204
+ - **Base Model:** `facebook/mbart-large-50-many-to-many-mmt`
205
+ - 610M total parameters
206
+ - Pre-trained on 50 languages
207
+ - Sequence-to-sequence architecture
208
+
209
+ - **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
210
+ - Rank (r): 8
211
+ - Alpha: 32
212
+ - Target modules: `q_proj`, `k_proj`, `v_proj`
213
+ - Only **1.77M trainable parameters** (0.29% of total)
214
+
215
+ ### Training Data
216
+ - **English:** SQuAD v1.1
217
+ - 20,000 samples (from 87,599 available)
218
+ - Balanced sampling across topics
219
+
220
+ - **German:** XQuAD (German)
221
+ - ~950 samples (80% of 1,190 available)
222
+ - Cross-lingual evaluation dataset
223
+
224
+ ### Hyperparameters
225
+ ```python
226
+ {
227
+ "learning_rate": 3e-4,
228
+ "batch_size": 16 (2 * 8 gradient accumulation),
229
+ "epochs": 3,
230
+ "max_source_length": 256,
231
+ "max_target_length": 64,
232
+ "fp16": True,
233
+ "optimizer": "AdamW",
234
+ "weight_decay": 0.01
235
+ }
236
+ ```
237
+
238
+ ---
239
+
240
+ ## 🔧 Training
241
+
242
+ ### Train from Scratch
243
+
244
+ ```bash
245
+ # See notebook/main.ipynb for full training pipeline
246
+ jupyter notebook notebook/main.ipynb
247
+ ```
248
+
249
+ ### Key Training Steps
250
+
251
+ 1. **Data Preparation**
252
+ - Load SQuAD and XQuAD datasets
253
+ - Convert to text-to-text format
254
+ - Tokenize with mBART tokenizer
255
+
256
+ 2. **Model Setup**
257
+ - Load base mBART model
258
+ - Apply LoRA configuration
259
+ - Configure language tokens
260
+
261
+ 3. **Training**
262
+ - English: 3 epochs (~2 hours on T4 GPU)
263
+ - German: 3 epochs (~30 minutes on T4 GPU)
264
+ - Total: ~2.5 hours
265
+
266
+ 4. **Evaluation**
267
+ - BLEU, ROUGE, Exact Match, F1
268
+ - Cross-lingual performance analysis
269
+
270
  ---
271
+
272
+ ## ⚠️ Limitations
273
+
274
+ ### Current Constraints
275
+ 1. **Long Context** - Performance degrades with passages >500 words
276
+ 2. **Complex Questions** - Multi-hop reasoning not supported
277
+ 3. **Answer Presence** - Answer must be explicitly stated in context
278
+ 4. **Languages** - Only English and German supported
279
+ 5. **Training Data** - Limited to 20K English + 1K German samples
280
+
281
+ ### Why These Exist
282
+ - ✂️ **Context truncation** due to GPU memory constraints
283
+ - 🧮 **Simple architecture** optimized for extractive QA only
284
+ - ⚡ **Fast training** prioritized over maximum performance
285
+
286
+ ---
287
+
288
+ ## 🎯 Future Improvements
289
+
290
+ - [ ] Increase context window to 512 tokens
291
+ - [ ] Add more languages (French, Spanish, Chinese)
292
+ - [ ] Implement answer confidence scoring
293
+ - [ ] Add data augmentation techniques
294
+ - [ ] Deploy as REST API with FastAPI
295
+ - [ ] Create Docker container for easy deployment
296
+ - [ ] Add answer verification layer
297
+ - [ ] Support generative (non-extractive) answers
298
+
299
  ---
300
 
301
+ ## 📖 Citation
302
+
303
+ If you use this project in your research or work, please cite:
304
+
305
+ ```bibtex
306
+ @software{verma2025multilingual_qa,
307
+ author = {Verma, Praanshull},
308
+ title = {Multilingual Question Answering System with mBART and LoRA},
309
+ year = {2025},
310
+ publisher = {GitHub},
311
+ url = {https://github.com/Praanshull/multilingual-qa-system}
312
+ }
313
+ ```
314
+
315
+ ---
316
+
317
+ ## 📄 License
318
+
319
+ This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
320
+
321
+ ---
322
+
323
+ ## 👨‍💻 Author
324
+
325
+ **Praanshull Verma**
326
+ - GitHub: [@Praanshull](https://github.com/Praanshull)
327
+ - LinkedIn: [Your LinkedIn]
328
+
329
+ ---
330
+
331
+ ## 🙏 Acknowledgments
332
+
333
+ - **Hugging Face** - For Transformers library and model hosting
334
+ - **Facebook AI** - For mBART pre-trained model
335
+ - **Stanford NLP** - For SQuAD dataset
336
+ - **Google Research** - For XQuAD dataset
337
+ - **PEFT Team** - For LoRA implementation
338
+
339
+ ---
340
+
341
+ ## 📞 Support
342
+
343
+ If you encounter any issues or have questions:
344
+
345
+ 1. Check [Issues](https://github.com/Praanshull/multilingual-qa-system/issues)
346
+ 2. Create a new issue with detailed description
347
+ 3. Reach out on LinkedIn
348
+
349
+ ---
350
+
351
+ <div align="center">
352
+
353
+ **Built with ❤️ using PyTorch, Transformers, and Gradio**
354
+
355
+ ⭐ Star this repo if you find it helpful!
356
+
357
+ </div>
app.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Main Application Entry Point
3
+ Multilingual Question Answering System with Gradio Interface
4
+ """
5
+
6
+ import sys
7
+ from pathlib import Path
8
+
9
+ # Add project root to path
10
+ project_root = Path(__file__).parent
11
+ sys.path.insert(0, str(project_root))
12
+
13
+ from app.model_loader import ModelLoader
14
+ from app.inference import QAInference
15
+ from app.interface import create_interface
16
+
17
+
18
+ def main():
19
+ """Main application entry point"""
20
+
21
+ print("=" * 80)
22
+ print("🚀 INITIALIZING MULTILINGUAL QA SYSTEM")
23
+ print("=" * 80)
24
+
25
+ # Configuration
26
+ MODEL_PATH = "models/multilingual_model" # Change this to your model path
27
+
28
+ # Load model
29
+ print(f"\n📂 Model path: {MODEL_PATH}")
30
+ loader = ModelLoader(model_path=MODEL_PATH)
31
+
32
+ try:
33
+ model, tokenizer = loader.load()
34
+ except Exception as e:
35
+ print(f"\n❌ Failed to load model: {e}")
36
+ print("\n💡 Please ensure:")
37
+ print(f" 1. Model exists at: {MODEL_PATH}")
38
+ print(" 2. All required files are present")
39
+ print(" 3. You have sufficient memory")
40
+ return
41
+
42
+ # Create inference engine
43
+ print("\n🔧 Initializing inference engine...")
44
+ inference_engine = QAInference(
45
+ model=model,
46
+ tokenizer=tokenizer,
47
+ device=loader.device
48
+ )
49
+ print("✅ Inference engine ready")
50
+
51
+ # Create interface
52
+ print("\n🎨 Building Gradio interface...")
53
+ demo = create_interface(inference_engine)
54
+ print("✅ Interface created")
55
+
56
+ # Launch
57
+ print("\n" + "=" * 80)
58
+ print("🚀 LAUNCHING APPLICATION")
59
+ print("=" * 80)
60
+
61
+ # Custom CSS
62
+ custom_css = """
63
+ .gradio-container {
64
+ font-family: 'Arial', sans-serif;
65
+ }
66
+ .header {
67
+ text-align: center;
68
+ padding: 20px;
69
+ background: linear-gradient(90deg, #3498db, #e74c3c);
70
+ color: white;
71
+ border-radius: 10px;
72
+ margin-bottom: 20px;
73
+ }
74
+ """
75
+
76
+ demo.launch(
77
+ server_name="0.0.0.0", # Allow external access
78
+ server_port=7860, # Default Gradio port
79
+ share=False, # Set to True for public URL
80
+ show_error=True,
81
+ quiet=False,
82
+ css=custom_css
83
+ )
84
+
85
+ print("\n✅ Application launched successfully!")
86
+ print("📱 Access the interface at: http://localhost:7860")
87
+ print("\n💡 TIP: Set share=True in demo.launch() to get a public URL")
88
+ print("=" * 80)
89
+
90
+
91
+ if __name__ == "__main__":
92
+ main()
requirements.txt ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core ML libraries
2
+ torch>=2.0.0
3
+ transformers>=4.35.0
4
+ datasets>=2.14.0
5
+ accelerate>=0.24.0
6
+ peft>=0.7.0
7
+
8
+ # Evaluation metrics
9
+ evaluate>=0.4.0
10
+ sacrebleu>=2.3.1
11
+ rouge-score>=0.1.2
12
+
13
+ # Web interface
14
+ gradio>=4.0.0
15
+
16
+ # Visualization
17
+ plotly>=5.17.0
18
+ pandas>=2.0.0
19
+
20
+ # Utilities
21
+ numpy>=1.24.0
22
+ tqdm>=4.66.0
23
+
24
+ # Optional: For faster tokenization
25
+ sentencepiece>=0.1.99
test_model.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Model Testing Script
3
+ Quick tests to verify model is working correctly
4
+ """
5
+
6
+ from app.model_loader import ModelLoader
7
+ from app.inference import QAInference
8
+
9
+
10
+ def test_english():
11
+ """Test English question answering"""
12
+ print("\n" + "=" * 80)
13
+ print("🇬🇧 TESTING ENGLISH")
14
+ print("=" * 80)
15
+
16
+ test_cases = [
17
+ {
18
+ "question": "What is the capital of France?",
19
+ "context": "Paris is the capital and most populous city of France.",
20
+ "expected": "Paris"
21
+ },
22
+ {
23
+ "question": "When was the Eiffel Tower built?",
24
+ "context": "The Eiffel Tower was constructed from 1887 to 1889.",
25
+ "expected": "1887 to 1889"
26
+ }
27
+ ]
28
+
29
+ return test_cases
30
+
31
+
32
+ def test_german():
33
+ """Test German question answering"""
34
+ print("\n" + "=" * 80)
35
+ print("🇩🇪 TESTING GERMAN")
36
+ print("=" * 80)
37
+
38
+ test_cases = [
39
+ {
40
+ "question": "Was ist die Hauptstadt von Deutschland?",
41
+ "context": "Berlin ist die Hauptstadt von Deutschland.",
42
+ "expected": "Berlin"
43
+ },
44
+ {
45
+ "question": "Wann wurde der Berliner Fernsehturm gebaut?",
46
+ "context": "Der Berliner Fernsehturm wurde zwischen 1965 und 1969 erbaut.",
47
+ "expected": "1965 bis 1969"
48
+ }
49
+ ]
50
+
51
+ return test_cases
52
+
53
+
54
+ def run_tests():
55
+ """Run all tests"""
56
+
57
+ print("""
58
+ ╔══════════════════════════════════════════════════════════════╗
59
+ ║ ║
60
+ ║ 🧪 MODEL TESTING SUITE 🧪 ║
61
+ ║ ║
62
+ ╚══════════════════════════════════════════════════════════════╝
63
+ """)
64
+
65
+ # Load model
66
+ print("\n📂 Loading model...")
67
+ try:
68
+ loader = ModelLoader(model_path="models/multilingual_model")
69
+ model, tokenizer = loader.load()
70
+
71
+ inference = QAInference(model, tokenizer, loader.device)
72
+ print("✅ Model loaded successfully!\n")
73
+ except Exception as e:
74
+ print(f"❌ Failed to load model: {e}")
75
+ print("\n💡 Make sure model files exist in models/multilingual_model/")
76
+ return
77
+
78
+ # Test English
79
+ english_tests = test_english()
80
+ passed = 0
81
+ total = len(english_tests)
82
+
83
+ for i, test in enumerate(english_tests, 1):
84
+ answer, _ = inference.answer_question(
85
+ test["question"],
86
+ test["context"],
87
+ "English"
88
+ )
89
+
90
+ print(f"\nTest {i}/{total}")
91
+ print(f"Q: {test['question']}")
92
+ print(f"Expected: {test['expected']}")
93
+ print(f"Got: {answer}")
94
+
95
+ if test["expected"].lower() in answer.lower():
96
+ print("✅ PASSED")
97
+ passed += 1
98
+ else:
99
+ print("❌ FAILED")
100
+
101
+ print(f"\n📊 English Results: {passed}/{total} passed ({passed/total*100:.1f}%)")
102
+
103
+ # Test German
104
+ german_tests = test_german()
105
+ passed = 0
106
+ total = len(german_tests)
107
+
108
+ for i, test in enumerate(german_tests, 1):
109
+ answer, _ = inference.answer_question(
110
+ test["question"],
111
+ test["context"],
112
+ "German"
113
+ )
114
+
115
+ print(f"\nTest {i}/{total}")
116
+ print(f"Q: {test['question']}")
117
+ print(f"Expected: {test['expected']}")
118
+ print(f"Got: {answer}")
119
+
120
+ if test["expected"].lower() in answer.lower():
121
+ print("✅ PASSED")
122
+ passed += 1
123
+ else:
124
+ print("❌ FAILED")
125
+
126
+ print(f"\n📊 German Results: {passed}/{total} passed ({passed/total*100:.1f}%)")
127
+
128
+ print("\n" + "=" * 80)
129
+ print("✅ TESTING COMPLETE!")
130
+ print("=" * 80)
131
+
132
+
133
+ if __name__ == "__main__":
134
+ run_tests()