multilingual-emotion-classifier / TESTING_GUIDE.md

rmtariq

🧪 Add Complete testing documentation

3cb6e39 verified 3 months ago

preview code

raw

history blame contribute delete

7.17 kB

🧪 Testing Guide for Multilingual Emotion Classifier

This guide provides comprehensive testing capabilities for the rmtariq/multilingual-emotion-classifier model.

🚀 Quick Start

Installation

# Install requirements
pip install -r requirements_testing.txt

# Or install manually
pip install torch transformers numpy pandas scikit-learn

Basic Usage

# Quick test (recommended for first-time users)
python test_model.py --test-type quick

# Comprehensive test
python test_model.py --test-type comprehensive

# Interactive testing
python test_model.py --test-type interactive

# Performance benchmark
python test_model.py --test-type benchmark

# Run all tests
python test_model.py --test-type all

📋 Test Types

1. 🚀 Quick Test

Purpose: Fast validation of core functionality
Duration: ~30 seconds
Coverage: 13 essential test cases (English + Malay)

python test_model.py --test-type quick

What it tests:

✅ Basic English emotions (6 cases)
✅ Basic Malay emotions (4 cases)
✅ Previously problematic cases (3 cases)

Expected Results: >90% accuracy

2. 🔬 Comprehensive Test

Purpose: Thorough validation across all categories
Duration: ~2 minutes
Coverage: 24 test cases across multiple categories

python test_model.py --test-type comprehensive

Test Categories:

English Basic: Core English emotion expressions
Malay Basic: Core Malay emotion expressions
Malay Fixed Issues: Previously problematic cases (now fixed)
Edge Cases: Boundary and special cases

Expected Results: >85% overall accuracy

3. 🎮 Interactive Test

Purpose: Manual testing with custom inputs
Duration: User-controlled
Coverage: Unlimited custom test cases

python test_model.py --test-type interactive

Features:

Real-time emotion classification
Confidence scoring
Emoji visualization
Easy exit (type 'quit')

Example Session:

💬 Your text: I am so excited!
🎭 Result: 😊 happy
📊 Confidence: 99.8%
💪 High confidence!

💬 Your text: Saya gembira!
🎭 Result: 😊 happy
📊 Confidence: 99.9%
💪 High confidence!

4. ⚡ Benchmark Test

Purpose: Performance and speed evaluation
Duration: ~1 minute
Coverage: 100 predictions for timing analysis

python test_model.py --test-type benchmark

Metrics Measured:

Total processing time
Average time per prediction
Predictions per second
Performance classification

Expected Results: >5 predictions/second

🎯 Supported Emotions

The model classifies text into 6 emotion categories:

Emotion	Emoji	Description	Example (English)	Example (Malay)
anger	😠	Frustration, rage	"I'm so angry!"	"Marah betul!"
fear	😨	Anxiety, worry	"I'm scared!"	"Takut sangat!"
happy	😊	Joy, excitement	"I'm so happy!"	"Gembira sangat!"
love	❤️	Affection, care	"I love you!"	"Sayang kamu!"
sadness	😢	Sorrow, grief	"I'm so sad"	"Sedih betul"
surprise	😲	Amazement, shock	"What a surprise!"	"Terkejut betul!"

🔧 Advanced Usage

Custom Model Testing

# Test a different model
python test_model.py --model "your-model-name" --test-type quick

# Test local model
python test_model.py --model "./path/to/local/model" --test-type comprehensive

Programmatic Usage

from test_model import EmotionModelTester

# Initialize tester
tester = EmotionModelTester("rmtariq/multilingual-emotion-classifier")

# Run specific tests
quick_accuracy = tester.quick_test()
comprehensive_accuracy = tester.comprehensive_test()
speed = tester.benchmark_test()

print(f"Quick test accuracy: {quick_accuracy:.1%}")
print(f"Comprehensive accuracy: {comprehensive_accuracy:.1%}")
print(f"Speed: {speed:.1f} predictions/second")

📊 Expected Performance

Accuracy Targets

Quick Test: >90% accuracy
Comprehensive Test: >85% accuracy
English Performance: >95% accuracy
Malay Performance: >85% accuracy

Speed Targets

CPU Performance: >5 predictions/second
GPU Performance: >20 predictions/second

Confidence Levels

High Confidence: >90% (💪)
Good Confidence: 70-90% (👍)
Low Confidence: <70% (⚠️)

🐛 Troubleshooting

Common Issues

1. Model Loading Errors

❌ Error loading model: ...

Solutions:

Check internet connection
Verify model name spelling
Try: pip install --upgrade transformers

2. CUDA/GPU Issues

CUDA out of memory

Solutions:

The model automatically falls back to CPU
Reduce batch size if using custom code
Use --device cpu flag if available

3. Slow Performance

⚠️ SLOW. Consider optimization.

Solutions:

Use GPU if available
Close other applications
Consider model quantization for production

Getting Help

If you encounter issues:

Check Requirements: Ensure all dependencies are installed
Update Libraries: pip install --upgrade transformers torch
Check Model Status: Visit model page
Report Issues: Create an issue on the repository

🎯 Test Case Examples

English Test Cases

# Basic emotions
"I am so happy today!"          # → happy
"This makes me really angry!"   # → anger
"I love you so much!"           # → love
"I'm scared of spiders"         # → fear
"This news makes me sad"        # → sadness
"What a surprise!"              # → surprise

Malay Test Cases

# Basic emotions
"Saya sangat gembira!"          # → happy
"Aku marah dengan keadaan ini"  # → anger
"Aku sayang kamu"               # → love
"Saya takut dengan ini"         # → fear
"Sedih betul dengan berita"     # → sadness
"Terkejut dengan kejadian"      # → surprise

# Fixed issues (previously problematic)
"Ini adalah hari jadi terbaik"  # → happy (was: anger)
"Terbaik!"                      # → happy (was: surprise)
"Ini adalah hari yang baik"     # → happy (was: anger)

📈 Performance History

Version 2.1 (Current)

✅ Overall Accuracy: 85.0%
✅ English Performance: 100%
✅ Malay Performance: 100% (fixed issues)
✅ Speed: 5-20 predictions/second

Key Improvements

🔧 Fixed Malay birthday context classification
🔧 Fixed "baik/terbaik" positive expression recognition
🔧 Improved confidence scores
🔧 Enhanced robustness

🏆 Success Criteria

A successful test run should show:

✅ Quick Test: >90% accuracy
✅ No Critical Failures: All basic emotions working
✅ Malay Fixes Verified: Birthday/positive contexts → happy
✅ Reasonable Speed: >5 predictions/second
✅ High Confidence: Most predictions >90%

Model Repository: https://huggingface.co/rmtariq/multilingual-emotion-classifier
Author: rmtariq
Last Updated: June 2024