--- language: en license: mit tags: - image-classification - efficientnet - vm-ai - activity-recognition datasets: - maxf-coder/task_image_classifier metrics: - accuracy - f1 --- # VM.AI — Image Classifier EfficientNet-B4 trained on 14 activity categories for the image-to-prompt pipeline. ## Performance | Metric | Value | |--------|-------| | Test samples | {test_samples} | | Top-1 accuracy | {top1} | | Top-3 accuracy | {top3} | | Macro F1 | {macro_f1} | | Weighted F1 | {weighted_f1} | ## Per-Class Metrics | Class | Precision | Recall | F1 | Support | |-------|-----------|--------|------|---------| {class_rows} ## Usage ```python import torch import timm from PIL import Image from torchvision import transforms model = timm.create_model("efficientnet_b4", pretrained=False, num_classes=14) model.load_state_dict(torch.load("efficientnet_b4_classifier.pth", map_location="cpu")) model.eval() transform = transforms.Compose([ transforms.Resize((380, 380)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ]) img = Image.open("photo.jpg").convert("RGB") tensor = transform(img).unsqueeze(0) with torch.no_grad(): logits = model(tensor) pred = logits.argmax(1).item() ``` ## Training Two-phase training: 5 frozen epochs (head only) + 20 unfrozen epochs (last 2 blocks). Optimizer: AdamW with cosine annealing. Mixed precision (AMP). See [train_classifier.py](https://github.com/Infiteri/VM.AI) for details. ## Charts ![Confusion matrix](confusion_matrix.png) ![Per-class metrics](per_class_metrics.png) ![Top-K accuracy](topk_accuracy.png)