File size: 1,634 Bytes
b2219a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
language: en
license: mit
tags:
  - image-classification
  - efficientnet
  - vm-ai
  - activity-recognition
datasets:
  - maxf-coder/task_image_classifier
metrics:
  - accuracy
  - f1
---

# VM.AI — Image Classifier

EfficientNet-B4 trained on 14 activity categories for the image-to-prompt pipeline.

## Performance

| Metric | Value |
|--------|-------|
| Test samples | {test_samples} |
| Top-1 accuracy | {top1} |
| Top-3 accuracy | {top3} |
| Macro F1 | {macro_f1} |
| Weighted F1 | {weighted_f1} |

## Per-Class Metrics

| Class | Precision | Recall | F1 | Support |
|-------|-----------|--------|------|---------|
{class_rows}
## Usage

```python
import torch
import timm
from PIL import Image
from torchvision import transforms

model = timm.create_model("efficientnet_b4", pretrained=False, num_classes=14)
model.load_state_dict(torch.load("efficientnet_b4_classifier.pth", map_location="cpu"))
model.eval()

transform = transforms.Compose([
    transforms.Resize((380, 380)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("photo.jpg").convert("RGB")
tensor = transform(img).unsqueeze(0)
with torch.no_grad():
    logits = model(tensor)
pred = logits.argmax(1).item()
```

## Training

Two-phase training: 5 frozen epochs (head only) + 20 unfrozen epochs (last 2 blocks).
Optimizer: AdamW with cosine annealing. Mixed precision (AMP).
See [train_classifier.py](https://github.com/Infiteri/VM.AI) for details.

## Charts

![Confusion matrix](confusion_matrix.png)
![Per-class metrics](per_class_metrics.png)
![Top-K accuracy](topk_accuracy.png)