File size: 1,634 Bytes
b2219a4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | ---
language: en
license: mit
tags:
- image-classification
- efficientnet
- vm-ai
- activity-recognition
datasets:
- maxf-coder/task_image_classifier
metrics:
- accuracy
- f1
---
# VM.AI — Image Classifier
EfficientNet-B4 trained on 14 activity categories for the image-to-prompt pipeline.
## Performance
| Metric | Value |
|--------|-------|
| Test samples | {test_samples} |
| Top-1 accuracy | {top1} |
| Top-3 accuracy | {top3} |
| Macro F1 | {macro_f1} |
| Weighted F1 | {weighted_f1} |
## Per-Class Metrics
| Class | Precision | Recall | F1 | Support |
|-------|-----------|--------|------|---------|
{class_rows}
## Usage
```python
import torch
import timm
from PIL import Image
from torchvision import transforms
model = timm.create_model("efficientnet_b4", pretrained=False, num_classes=14)
model.load_state_dict(torch.load("efficientnet_b4_classifier.pth", map_location="cpu"))
model.eval()
transform = transforms.Compose([
transforms.Resize((380, 380)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
img = Image.open("photo.jpg").convert("RGB")
tensor = transform(img).unsqueeze(0)
with torch.no_grad():
logits = model(tensor)
pred = logits.argmax(1).item()
```
## Training
Two-phase training: 5 frozen epochs (head only) + 20 unfrozen epochs (last 2 blocks).
Optimizer: AdamW with cosine annealing. Mixed precision (AMP).
See [train_classifier.py](https://github.com/Infiteri/VM.AI) for details.
## Charts



|