Model Card for Model ID

current batches:

nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1b2: 4000)

Try using google/siglip2-large-patch16-512 instead of dino v2 for a model difference (turns out 1% better than google/siglip2-base-patch16-512)..

eval metrics:

wandb: Run summary:
wandb:            eval/accuracy 0.77533
wandb:                eval/loss 0.4809
wandb:             eval/runtime 15.9025
wandb:  eval/samples_per_second 111.114
wandb:    eval/steps_per_second 0.692
wandb:               total_flos 1.4915777670524436e+20
wandb:              train/epoch 10.0
wandb:        train/global_step 570
wandb:          train/grad_norm 375217.9375
wandb:      train/learning_rate 0.0
wandb:               train/loss 0.286
wandb:               train_loss 0.40591
wandb:            train_runtime 1032.5423
wandb: train_samples_per_second 96.974
wandb:   train_steps_per_second 0.552

Model Details

trainlib commit: 1b17bfef5ccbb5a22157e56ab8da71ba7c8c0ed6

  • (it was comitted right after aug was changed for a later task)

training script:

#!/bin/bash

# =================== BEGIN NOTES =======================

# BS24 ooms; bs18 66943MiB / 81559MiB; try bs22
# bs22 (try to match siglip2-base for large as much as possible):  77679MiB / 81559MiB

# ORIGINAL AUGMENTATION:
# - model trained on this with exact config had eval/accuracy 0.77533

# train_transforms = Compose([
#     RandomResizedCrop(size),
#     RandomHorizontalFlip(),
#     ToTensor(),
#     normalize,
# ])

# MODIFIED AUGMENTATION:

# from torchvision.transforms import Compose, RandomResizedCrop, RandomRotation, RandomHorizontalFlip, ColorJitter, RandomApply, GaussianBlur, ToTensor

# train_transforms = Compose([
#     RandomResizedCrop(size=224, scale=(0.8, 1.0), ratio=(0.9, 1.1)),
#     RandomRotation(5),
#     RandomHorizontalFlip(p=0.2),
#     ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.05),
#     RandomApply([GaussianBlur(kernel_size=3, sigma=(0.5, 1.5))], p=0.1),
#     ToTensor(),
#     normalize,
# ])


# =================== END NOTES ==========================

# Define variables
BASE_MODEL="google/siglip2-large-patch16-512"
DATASET="distill-lab/COMBINE_nai-distill_00-01_eagle.library"
TASK="classification"
NUM_EPOCHS=10


# Run training command
python -m trainlib.hf_trainer.cli \
  --model_name_or_path $BASE_MODEL \
  --dataset_name $DATASET \
  --output_dir distill-n4_00-01_combined_cls_v1b2_classification_$BASE_MODEL \
  --remove_unused_columns False \
  --label_column_name star \
  --task $TASK \
  --do_train \
  --do_eval \
  --eval_strategy steps \
  --eval_steps 100 \
  --learning_rate 5e-6 \
  --num_train_epochs $NUM_EPOCHS \
  --per_device_train_batch_size 22 \
  --per_device_eval_batch_size 22 \
  --logging_strategy steps \
  --logging_steps 2 \
  --save_total_limit 1 \
  --seed 1337 \
  --lr_scheduler_type cosine \
  --dataloader_num_workers 16 \
  --ignore_mismatched_sizes True \
  --fp16 True  # EXTRA ARGUMENT
Downloads last month
8
Safetensors
Model size
317M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Collection including distill-lab/distill-n4_00-01_combined_cls_v1b2-siglip2-large-patch16-512