training config differences (same dataset)
Collection
3 items
•
Updated
current batches:
nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1b2: 4000)
Try using google/siglip2-large-patch16-512
instead of dino v2 for a model difference (turns out 1% better than google/siglip2-base-patch16-512
)..
eval metrics:
wandb: Run summary:
wandb: eval/accuracy 0.77533
wandb: eval/loss 0.4809
wandb: eval/runtime 15.9025
wandb: eval/samples_per_second 111.114
wandb: eval/steps_per_second 0.692
wandb: total_flos 1.4915777670524436e+20
wandb: train/epoch 10.0
wandb: train/global_step 570
wandb: train/grad_norm 375217.9375
wandb: train/learning_rate 0.0
wandb: train/loss 0.286
wandb: train_loss 0.40591
wandb: train_runtime 1032.5423
wandb: train_samples_per_second 96.974
wandb: train_steps_per_second 0.552
trainlib commit: 1b17bfef5ccbb5a22157e56ab8da71ba7c8c0ed6
training script:
#!/bin/bash
# =================== BEGIN NOTES =======================
# BS24 ooms; bs18 66943MiB / 81559MiB; try bs22
# bs22 (try to match siglip2-base for large as much as possible): 77679MiB / 81559MiB
# ORIGINAL AUGMENTATION:
# - model trained on this with exact config had eval/accuracy 0.77533
# train_transforms = Compose([
# RandomResizedCrop(size),
# RandomHorizontalFlip(),
# ToTensor(),
# normalize,
# ])
# MODIFIED AUGMENTATION:
# from torchvision.transforms import Compose, RandomResizedCrop, RandomRotation, RandomHorizontalFlip, ColorJitter, RandomApply, GaussianBlur, ToTensor
# train_transforms = Compose([
# RandomResizedCrop(size=224, scale=(0.8, 1.0), ratio=(0.9, 1.1)),
# RandomRotation(5),
# RandomHorizontalFlip(p=0.2),
# ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.05),
# RandomApply([GaussianBlur(kernel_size=3, sigma=(0.5, 1.5))], p=0.1),
# ToTensor(),
# normalize,
# ])
# =================== END NOTES ==========================
# Define variables
BASE_MODEL="google/siglip2-large-patch16-512"
DATASET="distill-lab/COMBINE_nai-distill_00-01_eagle.library"
TASK="classification"
NUM_EPOCHS=10
# Run training command
python -m trainlib.hf_trainer.cli \
--model_name_or_path $BASE_MODEL \
--dataset_name $DATASET \
--output_dir distill-n4_00-01_combined_cls_v1b2_classification_$BASE_MODEL \
--remove_unused_columns False \
--label_column_name star \
--task $TASK \
--do_train \
--do_eval \
--eval_strategy steps \
--eval_steps 100 \
--learning_rate 5e-6 \
--num_train_epochs $NUM_EPOCHS \
--per_device_train_batch_size 22 \
--per_device_eval_batch_size 22 \
--logging_strategy steps \
--logging_steps 2 \
--save_total_limit 1 \
--seed 1337 \
--lr_scheduler_type cosine \
--dataloader_num_workers 16 \
--ignore_mismatched_sizes True \
--fp16 True # EXTRA ARGUMENT