Model Card for Model ID

current batches:

nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1b2: 4000)

same as https://huggingface.co/distill-lab/distill-n4_00-01_combined_cls_v1b2 but instead of 20 tried 100e.

metrics:

8168 ***** train metrics *****
8169   epoch                    =          100.0
8170   total_flos               = 334833095087GF
8171   train_loss               =         0.0776
8172   train_runtime            =     4:53:00.40
8173   train_samples_per_second =         56.955
8174   train_steps_per_second   =          0.893
8176 ***** eval metrics *****
8177   epoch                   =           100.0
8178   eval_accuracy           =          0.7487
8179   eval_loss               =          1.9947
8180   eval_runtime            =      0:00:12.56
8181   eval_samples_per_second =         140.622
8182   eval_steps_per_second   =           2.945

Model details:

(no significant accuracy jump; just to see what happens)

BASE_MODEL = "facebook/dinov2-with-registers-large"
DATASET = "distill-lab/COMBINE_nai-distill_00-01_eagle.library" 
TASK = "classification"
# using single card to train it, so had to do higher batch size

cmd = f"""python -m trainlib.hf_trainer.cli \
  --model_name_or_path {BASE_MODEL} \
  --dataset_name {DATASET} \
  --output_dir distill-n4_00-01_combined_cls_v1b2-100e \
  --remove_unused_columns False \
  --label_column_name star \
  --task {TASK} \
  --do_train \
  --do_eval \
  --eval_strategy steps \
  --eval_steps 100 \
  --learning_rate 1e-5 \
  --num_train_epochs 100 \
  --per_device_train_batch_size 64 \
  --per_device_eval_batch_size 48 \
  --logging_strategy steps \
  --logging_steps 2 \
  --save_total_limit 1 \
  --seed 1337 \
  --lr_scheduler_type cosine \
  --dataloader_num_workers 16 \
  --ignore_mismatched_sizes True

"""

rest = f"""  
--push_to_hub: True \
  --push_to_hub_organization distill-lab \
  --hub_model_id nai-distill_00-01_combined_eagle_{TASK} \
  --hub_strategy "end"""

print(cmd)
!{cmd}

distill-lab
/

distill-n4_00-01_combined_cls_v1b2-100e

Model Card for Model ID

Model details:

Collection including distill-lab/distill-n4_00-01_combined_cls_v1b2-100e

training config differences (same dataset)