Revix AI engine knock detection model
Model Description
This model is a specialized engine knock detection system based on the Audio Spectrogram Transformer (AST) architecture. It's fine-tuned from MIT's pre-trained AST model to identify engine knock events from audio spectrograms with high accuracy and reliability.
Engine knock (also known as detonation) is a harmful combustion phenomenon in internal combustion engines that can cause severe engine damage if not detected and addressed promptly. This model provides automated, real-time detection capabilities for automotive diagnostic and monitoring systems.
Architecture
- Base Model: Vision Transformer adapted for audio spectrograms
- Input: Audio spectrograms converted to visual representations
- Output: Binary classification (Knock/No-Knock)
- Approach: Treats audio spectrograms as images, leveraging ViT's powerful pattern recognition
Performance
The model achieves excellent performance on engine knock detection:
Metric | Value | Interpretation |
---|---|---|
Accuracy | 90.83% | Correctly identifies 9 out of 10 cases |
Precision | 92.44% | When model predicts knock, it's right 92.4% of the time |
Recall | 89.43% | Catches 89.4% of actual knock events |
F1 Score | 90.91% | Excellent balance between precision and recall |
Production Readiness
- โ High Accuracy: Exceeds 90% accuracy threshold for automotive applications
- โ Balanced Performance: Strong precision-recall balance minimizes false alarms
- โ Stable Training: 3.4x training/validation loss gap indicates good generalization
- โ Real-world Ready: Optimized with early stopping and regularization techniques
Intended Uses
Primary Applications
- Automotive Diagnostics: Real-time engine knock detection in vehicles
- Engine Testing: Quality control during engine development and testing
- Predictive Maintenance: Early warning system for engine health monitoring
Limitations
Technical Limitations
- Audio Quality Dependency: Performance may degrade with poor quality recordings
- Engine Type Specificity: Trained on specific engine types; may need retraining for different engines
- Environmental Noise: Background noise may affect detection accuracy
- Sampling Rate: Optimized for specific audio sampling rates and spectrogram parameters
Operational Constraints
- Requires conversion of audio to spectrograms for processing
- Real-time performance depends on hardware capabilities
- May need recalibration for different vehicle models or engine configurations
Training Data
The model was fine-tuned on audio recordings specifically collected for engine knock detection, converted to spectrogram format for visual processing by the transformer architecture.
Data Preprocessing
- Audio signals converted to mel-spectrograms
- Spectrograms normalized and resized for ViT input requirements
- Data augmentation applied to improve robustness
Training Procedure
Optimization Strategy
The model was trained using advanced techniques to prevent overfitting and ensure production reliability:
- Early Stopping: Training automatically stopped at optimal performance point (Epoch 3)
- Learning Rate: Conservative rate (2e-05) for stable convergence
- Mixed Precision: FP16 training for efficient computation on T4 GPU
- Regularization: Weight decay of 0.01 for better generalization
Training Hyperparameters
- Learning Rate: 2e-05
- Batch Size: 8 (train/eval)
- Epochs: 3 (early stopped)
- Optimizer: AdamW with fused implementation
- Mixed Precision: Native AMP (FP16)
- Scheduler: Linear learning rate decay
Training Results
Training Loss | Epoch | Validation Loss | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|
0.3156 | 1.0 | 0.4224 | 0.8625 | 0.8261 | 0.9268 | 0.8736 |
0.21 | 2.0 | 0.4320 | 0.8667 | 0.8421 | 0.9106 | 0.875 |
0.1121 | 3.0 | 0.3794 | 0.9083 | 0.9244 | 0.8943 | 0.9091 |
Usage Example
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
import torch
import librosa
import numpy as np
# Load model and feature extractor
model = AutoModelForImageClassification.from_pretrained("your-username/revix-classifier_8.0")
feature_extractor = AutoFeatureExtractor.from_pretrained("your-username/revix-classifier_8.0")
def detect_engine_knock(audio_file_path):
# Load and preprocess audio
audio, sr = librosa.load(audio_file_path, sr=16000)
# Convert to mel-spectrogram
spectrogram = librosa.feature.melspectrogram(y=audio, sr=sr)
spectrogram_db = librosa.power_to_db(spectrogram, ref=np.max)
# Prepare input for model
inputs = feature_extractor(spectrogram_db, return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
prediction = torch.argmax(probabilities, dim=-1)
return {
"knock_detected": bool(prediction.item()),
"confidence": float(probabilities.max().item())
}
# Example usage
result = detect_engine_knock("engine_audio.wav")
print(f"Knock detected: {result['knock_detected']}")
print(f"Confidence: {result['confidence']:.3f}")
This model was developed by
1.Lwanga Caleb 2.Arinda Emmanuel 3. Ssempija Gideon Ethan
This model was
Framework Versions
- Transformers: 4.56.1
- PyTorch: 2.8.0+cu126
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citation
If you use this model in your research or applications, please cite:
@model{revix-classifier-8.0,
title={Knowledge-Grounded Acoustic Diagnostics on Smartphones for Early Engine Fault Detection},
author={[Lwanga Caleb, Arinda Emmanuel, Ssempija Gideon Ethan]},
year={2025},
url={https://huggingface.co/cxlrd/revix-engineknock_classifier}
}
- Downloads last month
- 18
Model tree for cxlrd/revix-classifier_8.0
Base model
MIT/ast-finetuned-audioset-10-10-0.4593Evaluation results
- Accuracyself-reported0.908
- Precisionself-reported0.924
- Recallself-reported0.894
- F1 Scoreself-reported0.909