| --- |
| language: en |
| license: mit |
| tags: |
| - image-classification |
| - efficientnet |
| - vm-ai |
| - activity-recognition |
| datasets: |
| - maxf-coder/task_image_classifier |
| metrics: |
| - accuracy |
| - f1 |
| --- |
| |
| # VM.AI — Image Classifier |
|
|
| EfficientNet-B4 trained on 14 activity categories for the image-to-prompt pipeline. |
|
|
| ## Performance |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Test samples | {test_samples} | |
| | Top-1 accuracy | {top1} | |
| | Top-3 accuracy | {top3} | |
| | Macro F1 | {macro_f1} | |
| | Weighted F1 | {weighted_f1} | |
| |
| ## Per-Class Metrics |
| |
| | Class | Precision | Recall | F1 | Support | |
| |-------|-----------|--------|------|---------| |
| {class_rows} |
| ## Usage |
|
|
| ```python |
| import torch |
| import timm |
| from PIL import Image |
| from torchvision import transforms |
| |
| model = timm.create_model("efficientnet_b4", pretrained=False, num_classes=14) |
| model.load_state_dict(torch.load("efficientnet_b4_classifier.pth", map_location="cpu")) |
| model.eval() |
| |
| transform = transforms.Compose([ |
| transforms.Resize((380, 380)), |
| transforms.ToTensor(), |
| transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), |
| ]) |
| |
| img = Image.open("photo.jpg").convert("RGB") |
| tensor = transform(img).unsqueeze(0) |
| with torch.no_grad(): |
| logits = model(tensor) |
| pred = logits.argmax(1).item() |
| ``` |
|
|
| ## Training |
|
|
| Two-phase training: 5 frozen epochs (head only) + 20 unfrozen epochs (last 2 blocks). |
| Optimizer: AdamW with cosine annealing. Mixed precision (AMP). |
| See [train_classifier.py](https://github.com/Infiteri/VM.AI) for details. |
|
|
| ## Charts |
|
|
|  |
|  |
|  |
|
|