rmtariq's picture
๐Ÿงช Add Complete testing documentation
3cb6e39 verified

๐Ÿงช Testing Guide for Multilingual Emotion Classifier

This guide provides comprehensive testing capabilities for the rmtariq/multilingual-emotion-classifier model.

๐Ÿš€ Quick Start

Installation

# Install requirements
pip install -r requirements_testing.txt

# Or install manually
pip install torch transformers numpy pandas scikit-learn

Basic Usage

# Quick test (recommended for first-time users)
python test_model.py --test-type quick

# Comprehensive test
python test_model.py --test-type comprehensive

# Interactive testing
python test_model.py --test-type interactive

# Performance benchmark
python test_model.py --test-type benchmark

# Run all tests
python test_model.py --test-type all

๐Ÿ“‹ Test Types

1. ๐Ÿš€ Quick Test

Purpose: Fast validation of core functionality
Duration: ~30 seconds
Coverage: 13 essential test cases (English + Malay)

python test_model.py --test-type quick

What it tests:

  • โœ… Basic English emotions (6 cases)
  • โœ… Basic Malay emotions (4 cases)
  • โœ… Previously problematic cases (3 cases)

Expected Results: >90% accuracy

2. ๐Ÿ”ฌ Comprehensive Test

Purpose: Thorough validation across all categories
Duration: ~2 minutes
Coverage: 24 test cases across multiple categories

python test_model.py --test-type comprehensive

Test Categories:

  • English Basic: Core English emotion expressions
  • Malay Basic: Core Malay emotion expressions
  • Malay Fixed Issues: Previously problematic cases (now fixed)
  • Edge Cases: Boundary and special cases

Expected Results: >85% overall accuracy

3. ๐ŸŽฎ Interactive Test

Purpose: Manual testing with custom inputs
Duration: User-controlled
Coverage: Unlimited custom test cases

python test_model.py --test-type interactive

Features:

  • Real-time emotion classification
  • Confidence scoring
  • Emoji visualization
  • Easy exit (type 'quit')

Example Session:

๐Ÿ’ฌ Your text: I am so excited!
๐ŸŽญ Result: ๐Ÿ˜Š happy
๐Ÿ“Š Confidence: 99.8%
๐Ÿ’ช High confidence!

๐Ÿ’ฌ Your text: Saya gembira!
๐ŸŽญ Result: ๐Ÿ˜Š happy
๐Ÿ“Š Confidence: 99.9%
๐Ÿ’ช High confidence!

4. โšก Benchmark Test

Purpose: Performance and speed evaluation
Duration: ~1 minute
Coverage: 100 predictions for timing analysis

python test_model.py --test-type benchmark

Metrics Measured:

  • Total processing time
  • Average time per prediction
  • Predictions per second
  • Performance classification

Expected Results: >5 predictions/second

๐ŸŽฏ Supported Emotions

The model classifies text into 6 emotion categories:

Emotion Emoji Description Example (English) Example (Malay)
anger ๐Ÿ˜  Frustration, rage "I'm so angry!" "Marah betul!"
fear ๐Ÿ˜จ Anxiety, worry "I'm scared!" "Takut sangat!"
happy ๐Ÿ˜Š Joy, excitement "I'm so happy!" "Gembira sangat!"
love โค๏ธ Affection, care "I love you!" "Sayang kamu!"
sadness ๐Ÿ˜ข Sorrow, grief "I'm so sad" "Sedih betul"
surprise ๐Ÿ˜ฒ Amazement, shock "What a surprise!" "Terkejut betul!"

๐Ÿ”ง Advanced Usage

Custom Model Testing

# Test a different model
python test_model.py --model "your-model-name" --test-type quick

# Test local model
python test_model.py --model "./path/to/local/model" --test-type comprehensive

Programmatic Usage

from test_model import EmotionModelTester

# Initialize tester
tester = EmotionModelTester("rmtariq/multilingual-emotion-classifier")

# Run specific tests
quick_accuracy = tester.quick_test()
comprehensive_accuracy = tester.comprehensive_test()
speed = tester.benchmark_test()

print(f"Quick test accuracy: {quick_accuracy:.1%}")
print(f"Comprehensive accuracy: {comprehensive_accuracy:.1%}")
print(f"Speed: {speed:.1f} predictions/second")

๐Ÿ“Š Expected Performance

Accuracy Targets

  • Quick Test: >90% accuracy
  • Comprehensive Test: >85% accuracy
  • English Performance: >95% accuracy
  • Malay Performance: >85% accuracy

Speed Targets

  • CPU Performance: >5 predictions/second
  • GPU Performance: >20 predictions/second

Confidence Levels

  • High Confidence: >90% (๐Ÿ’ช)
  • Good Confidence: 70-90% (๐Ÿ‘)
  • Low Confidence: <70% (โš ๏ธ)

๐Ÿ› Troubleshooting

Common Issues

1. Model Loading Errors

โŒ Error loading model: ...

Solutions:

  • Check internet connection
  • Verify model name spelling
  • Try: pip install --upgrade transformers

2. CUDA/GPU Issues

CUDA out of memory

Solutions:

  • The model automatically falls back to CPU
  • Reduce batch size if using custom code
  • Use --device cpu flag if available

3. Slow Performance

โš ๏ธ SLOW. Consider optimization.

Solutions:

  • Use GPU if available
  • Close other applications
  • Consider model quantization for production

Getting Help

If you encounter issues:

  1. Check Requirements: Ensure all dependencies are installed
  2. Update Libraries: pip install --upgrade transformers torch
  3. Check Model Status: Visit model page
  4. Report Issues: Create an issue on the repository

๐ŸŽฏ Test Case Examples

English Test Cases

# Basic emotions
"I am so happy today!"          # โ†’ happy
"This makes me really angry!"   # โ†’ anger
"I love you so much!"           # โ†’ love
"I'm scared of spiders"         # โ†’ fear
"This news makes me sad"        # โ†’ sadness
"What a surprise!"              # โ†’ surprise

Malay Test Cases

# Basic emotions
"Saya sangat gembira!"          # โ†’ happy
"Aku marah dengan keadaan ini"  # โ†’ anger
"Aku sayang kamu"               # โ†’ love
"Saya takut dengan ini"         # โ†’ fear
"Sedih betul dengan berita"     # โ†’ sadness
"Terkejut dengan kejadian"      # โ†’ surprise

# Fixed issues (previously problematic)
"Ini adalah hari jadi terbaik"  # โ†’ happy (was: anger)
"Terbaik!"                      # โ†’ happy (was: surprise)
"Ini adalah hari yang baik"     # โ†’ happy (was: anger)

๐Ÿ“ˆ Performance History

Version 2.1 (Current)

  • โœ… Overall Accuracy: 85.0%
  • โœ… English Performance: 100%
  • โœ… Malay Performance: 100% (fixed issues)
  • โœ… Speed: 5-20 predictions/second

Key Improvements

  • ๐Ÿ”ง Fixed Malay birthday context classification
  • ๐Ÿ”ง Fixed "baik/terbaik" positive expression recognition
  • ๐Ÿ”ง Improved confidence scores
  • ๐Ÿ”ง Enhanced robustness

๐Ÿ† Success Criteria

A successful test run should show:

  • โœ… Quick Test: >90% accuracy
  • โœ… No Critical Failures: All basic emotions working
  • โœ… Malay Fixes Verified: Birthday/positive contexts โ†’ happy
  • โœ… Reasonable Speed: >5 predictions/second
  • โœ… High Confidence: Most predictions >90%

Model Repository: https://huggingface.co/rmtariq/multilingual-emotion-classifier
Author: rmtariq
Last Updated: June 2024