2024-02-13,00:02:16 | INFO | Running with a single process. Device cuda:0. 2024-02-13,00:02:16 | INFO | Loaded ViT-B-32 model config. 2024-02-13,00:02:18 | INFO | Loading pretrained ViT-B-32 weights (laion2b_s34b_b79k). 2024-02-13,00:02:18 | INFO | Model: 2024-02-13,00:02:18 | INFO | CLIP( (visual): VisionTransformer( (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False) (patch_dropout): Identity() (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (transformer): Transformer( (resblocks): ModuleList( (0-11): 12 x ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) ) ) (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (transformer): Transformer( (resblocks): ModuleList( (0-11): 12 x ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) ) ) (token_embedding): Embedding(49408, 512) (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) 2024-02-13,00:02:18 | INFO | Params: 2024-02-13,00:02:18 | INFO | accum_freq: 1 2024-02-13,00:02:18 | INFO | aug_cfg: {} 2024-02-13,00:02:18 | INFO | batch_size: 512 2024-02-13,00:02:18 | INFO | beta1: 0.9 2024-02-13,00:02:18 | INFO | beta2: 0.98 2024-02-13,00:02:18 | INFO | checkpoint_path: ./logs/2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16/checkpoints 2024-02-13,00:02:18 | INFO | coca_caption_loss_weight: 2.0 2024-02-13,00:02:18 | INFO | coca_contrastive_loss_weight: 1.0 2024-02-13,00:02:18 | INFO | copy_codebase: False 2024-02-13,00:02:18 | INFO | csv_caption_key: captions 2024-02-13,00:02:18 | INFO | csv_img_key: images 2024-02-13,00:02:18 | INFO | csv_separator: 2024-02-13,00:02:18 | INFO | dataset_resampled: False 2024-02-13,00:02:18 | INFO | dataset_type: auto 2024-02-13,00:02:18 | INFO | ddp_static_graph: True 2024-02-13,00:02:18 | INFO | debug: False 2024-02-13,00:02:18 | INFO | delete_previous_checkpoint: False 2024-02-13,00:02:18 | INFO | device: cuda:0 2024-02-13,00:02:18 | INFO | dist_backend: nccl 2024-02-13,00:02:18 | INFO | dist_url: env:// 2024-02-13,00:02:18 | INFO | distill: False 2024-02-13,00:02:18 | INFO | distill_model: None 2024-02-13,00:02:18 | INFO | distill_pretrained: None 2024-02-13,00:02:18 | INFO | distributed: False 2024-02-13,00:02:18 | INFO | epochs: 15 2024-02-13,00:02:18 | INFO | epochs_cooldown: None 2024-02-13,00:02:18 | INFO | eps: 1e-06 2024-02-13,00:02:18 | INFO | force_custom_text: False 2024-02-13,00:02:18 | INFO | force_image_size: None 2024-02-13,00:02:18 | INFO | force_patch_dropout: None 2024-02-13,00:02:18 | INFO | force_quick_gelu: False 2024-02-13,00:02:18 | INFO | gather_with_grad: True 2024-02-13,00:02:18 | INFO | grad_checkpointing: False 2024-02-13,00:02:18 | INFO | grad_clip_norm: None 2024-02-13,00:02:18 | INFO | horovod: False 2024-02-13,00:02:18 | INFO | image_interpolation: None 2024-02-13,00:02:18 | INFO | image_mean: None 2024-02-13,00:02:18 | INFO | image_resize_mode: None 2024-02-13,00:02:18 | INFO | image_std: None 2024-02-13,00:02:18 | INFO | imagenet_v2: None 2024-02-13,00:02:18 | INFO | imagenet_val: None 2024-02-13,00:02:18 | INFO | local_loss: True 2024-02-13,00:02:18 | INFO | local_rank: 0 2024-02-13,00:02:18 | INFO | lock_image: False 2024-02-13,00:02:18 | INFO | lock_image_freeze_bn_stats: False 2024-02-13,00:02:18 | INFO | lock_image_unlocked_groups: 0 2024-02-13,00:02:18 | INFO | lock_text: False 2024-02-13,00:02:18 | INFO | lock_text_freeze_layer_norm: False 2024-02-13,00:02:18 | INFO | lock_text_unlocked_layers: 0 2024-02-13,00:02:18 | INFO | log_every_n_steps: 100 2024-02-13,00:02:18 | INFO | log_level: 20 2024-02-13,00:02:18 | INFO | log_local: False 2024-02-13,00:02:18 | INFO | log_path: ./logs/2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16/out.log 2024-02-13,00:02:18 | INFO | logs: ./logs/ 2024-02-13,00:02:18 | INFO | lr: 5e-05 2024-02-13,00:02:18 | INFO | lr_cooldown_end: 0.0 2024-02-13,00:02:18 | INFO | lr_cooldown_power: 1.0 2024-02-13,00:02:18 | INFO | lr_scheduler: cosine 2024-02-13,00:02:18 | INFO | model: ViT-B-32 2024-02-13,00:02:18 | INFO | name: 2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16 2024-02-13,00:02:18 | INFO | no_set_device_rank: False 2024-02-13,00:02:18 | INFO | precision: amp_bf16 2024-02-13,00:02:18 | INFO | pretrained: laion2b_s34b_b79k 2024-02-13,00:02:18 | INFO | pretrained_image: False 2024-02-13,00:02:18 | INFO | rank: 0 2024-02-13,00:02:18 | INFO | remote_sync: None 2024-02-13,00:02:18 | INFO | remote_sync_frequency: 300 2024-02-13,00:02:18 | INFO | remote_sync_protocol: s3 2024-02-13,00:02:18 | INFO | report_to: 2024-02-13,00:02:18 | INFO | resume: None 2024-02-13,00:02:18 | INFO | save_frequency: 5 2024-02-13,00:02:18 | INFO | save_most_recent: False 2024-02-13,00:02:18 | INFO | seed: 0 2024-02-13,00:02:18 | INFO | siglip: False 2024-02-13,00:02:18 | INFO | skip_scheduler: False 2024-02-13,00:02:18 | INFO | tensorboard: False 2024-02-13,00:02:18 | INFO | tensorboard_path: 2024-02-13,00:02:18 | INFO | torchcompile: False 2024-02-13,00:02:18 | INFO | torchscript: False 2024-02-13,00:02:18 | INFO | trace: False 2024-02-13,00:02:18 | INFO | train_data: ../../train_data_counting_neg_clip.csv 2024-02-13,00:02:18 | INFO | train_data_upsampling_factors: None 2024-02-13,00:02:18 | INFO | train_num_samples: None 2024-02-13,00:02:18 | INFO | use_bn_sync: False 2024-02-13,00:02:18 | INFO | use_bnb_linear: None 2024-02-13,00:02:18 | INFO | val_data: None 2024-02-13,00:02:18 | INFO | val_frequency: 5 2024-02-13,00:02:18 | INFO | val_num_samples: None 2024-02-13,00:02:18 | INFO | wandb: False 2024-02-13,00:02:18 | INFO | wandb_notes: 2024-02-13,00:02:18 | INFO | wandb_project_name: open-clip 2024-02-13,00:02:18 | INFO | warmup: 1024 2024-02-13,00:02:18 | INFO | wd: 0.2 2024-02-13,00:02:18 | INFO | workers: 8 2024-02-13,00:02:18 | INFO | world_size: 1 2024-02-13,00:02:18 | INFO | zeroshot_frequency: 5 2024-02-13,00:02:18 | INFO | Start epoch 0 2024-02-13,00:02:32 | INFO | Train Epoch: 0 [ 1024/10010 (5%)] Data (t): 7.751 Batch (t): 13.065, 39.1899/s, 39.1899/s/gpu LR: 0.000000 Logit Scale: 100.000 Contrastive_loss: 6.4139 (6.4139) Loss: 6.4139 (6.4139) 2024-02-13,00:02:42 | INFO | Train Epoch: 0 [19456/10010 (100%)] Data (t): 0.001 Batch (t): 0.583, 874.223/s, 874.223/s/gpu LR: 0.000001 Logit Scale: 99.999 Contrastive_loss: 5.2933 (5.8536) Loss: 5.2933 (5.8536) 2024-02-13,00:02:42 | INFO | Start epoch 1 2024-02-13,00:02:49 | INFO | Train Epoch: 1 [ 1024/10010 (5%)] Data (t): 6.269 Batch (t): 6.642, 77.0819/s, 77.0819/s/gpu LR: 0.000001 Logit Scale: 99.999 Contrastive_loss: 5.0906 (5.0906) Loss: 5.0906 (5.0906) 2024-02-13,00:03:00 | INFO | Train Epoch: 1 [19456/10010 (100%)] Data (t): 0.013 Batch (t): 0.588, 864.032/s, 864.032/s/gpu LR: 0.000002 Logit Scale: 99.997 Contrastive_loss: 4.3162 (4.7034) Loss: 4.3162 (4.7034) 2024-02-13,00:03:00 | INFO | Start epoch 2 2024-02-13,00:03:08 | INFO | Train Epoch: 2 [ 1024/10010 (5%)] Data (t): 7.151 Batch (t): 7.527, 68.0199/s, 68.0199/s/gpu LR: 0.000002 Logit Scale: 99.997 Contrastive_loss: 4.1381 (4.1381) Loss: 4.1381 (4.1381) 2024-02-13,00:03:19 | INFO | Train Epoch: 2 [19456/10010 (100%)] Data (t): 0.033 Batch (t): 0.595, 866.703/s, 866.703/s/gpu LR: 0.000003 Logit Scale: 99.996 Contrastive_loss: 3.7141 (3.9261) Loss: 3.7141 (3.9261) 2024-02-13,00:03:19 | INFO | Start epoch 3 2024-02-13,00:03:26 | INFO | Train Epoch: 3 [ 1024/10010 (5%)] Data (t): 6.530 Batch (t): 6.891, 74.2989/s, 74.2989/s/gpu LR: 0.000003 Logit Scale: 99.996 Contrastive_loss: 3.7603 (3.7603) Loss: 3.7603 (3.7603) 2024-02-13,00:03:37 | INFO | Train Epoch: 3 [19456/10010 (100%)] Data (t): 0.045 Batch (t): 0.608, 865.795/s, 865.795/s/gpu LR: 0.000004 Logit Scale: 99.996 Contrastive_loss: 3.2845 (3.5224) Loss: 3.2845 (3.5224) 2024-02-13,00:03:37 | INFO | Start epoch 4 2024-02-13,00:03:44 | INFO | Train Epoch: 4 [ 1024/10010 (5%)] Data (t): 5.669 Batch (t): 6.038, 84.7949/s, 84.7949/s/gpu LR: 0.000004 Logit Scale: 99.996 Contrastive_loss: 3.1494 (3.1494) Loss: 3.1494 (3.1494) 2024-02-13,00:03:55 | INFO | Train Epoch: 4 [19456/10010 (100%)] Data (t): 0.026 Batch (t): 0.605, 864.414/s, 864.414/s/gpu LR: 0.000005 Logit Scale: 99.996 Contrastive_loss: 2.7506 (2.9500) Loss: 2.7506 (2.9500) 2024-02-13,00:03:57 | INFO | Start epoch 5 2024-02-13,00:04:04 | INFO | Train Epoch: 5 [ 1024/10010 (5%)] Data (t): 6.324 Batch (t): 6.699, 76.4267/s, 76.4267/s/gpu LR: 0.000005 Logit Scale: 99.996 Contrastive_loss: 2.7057 (2.7057) Loss: 2.7057 (2.7057) 2024-02-13,00:04:15 | INFO | Train Epoch: 5 [19456/10010 (100%)] Data (t): 0.089 Batch (t): 0.639, 867.588/s, 867.588/s/gpu LR: 0.000006 Logit Scale: 99.996 Contrastive_loss: 2.4321 (2.5689) Loss: 2.4321 (2.5689) 2024-02-13,00:04:16 | INFO | Start epoch 6 2024-02-13,00:04:22 | INFO | Train Epoch: 6 [ 1024/10010 (5%)] Data (t): 6.129 Batch (t): 6.509, 78.6643/s, 78.6643/s/gpu LR: 0.000006 Logit Scale: 99.996 Contrastive_loss: 2.3407 (2.3407) Loss: 2.3407 (2.3407) 2024-02-13,00:04:33 | INFO | Train Epoch: 6 [19456/10010 (100%)] Data (t): 0.013 Batch (t): 0.585, 866.873/s, 866.873/s/gpu LR: 0.000006 Logit Scale: 99.997 Contrastive_loss: 2.2057 (2.2732) Loss: 2.2057 (2.2732) 2024-02-13,00:04:33 | INFO | Start epoch 7 2024-02-13,00:04:40 | INFO | Train Epoch: 7 [ 1024/10010 (5%)] Data (t): 6.374 Batch (t): 6.750, 75.8483/s, 75.8483/s/gpu LR: 0.000007 Logit Scale: 99.997 Contrastive_loss: 1.9728 (1.9728) Loss: 1.9728 (1.9728) 2024-02-13,00:04:51 | INFO | Train Epoch: 7 [19456/10010 (100%)] Data (t): 0.070 Batch (t): 0.629, 865.081/s, 865.081/s/gpu LR: 0.000007 Logit Scale: 99.998 Contrastive_loss: 1.8460 (1.9094) Loss: 1.8460 (1.9094) 2024-02-13,00:04:52 | INFO | Start epoch 8 2024-02-13,00:04:59 | INFO | Train Epoch: 8 [ 1024/10010 (5%)] Data (t): 6.348 Batch (t): 6.724, 76.1504/s, 76.1504/s/gpu LR: 0.000007 Logit Scale: 99.999 Contrastive_loss: 1.6491 (1.6491) Loss: 1.6491 (1.6491) 2024-02-13,00:05:09 | INFO | Train Epoch: 8 [19456/10010 (100%)] Data (t): 0.039 Batch (t): 0.591, 864.632/s, 864.632/s/gpu LR: 0.000008 Logit Scale: 100.000 Contrastive_loss: 1.5005 (1.5748) Loss: 1.5005 (1.5748) 2024-02-13,00:05:10 | INFO | Start epoch 9 2024-02-13,00:05:16 | INFO | Train Epoch: 9 [ 1024/10010 (5%)] Data (t): 5.740 Batch (t): 6.101, 83.9270/s, 83.9270/s/gpu LR: 0.000008 Logit Scale: 100.000 Contrastive_loss: 1.2527 (1.2527) Loss: 1.2527 (1.2527) 2024-02-13,00:05:27 | INFO | Train Epoch: 9 [19456/10010 (100%)] Data (t): 0.029 Batch (t): 0.592, 866.672/s, 866.672/s/gpu LR: 0.000009 Logit Scale: 100.000 Contrastive_loss: 1.1425 (1.1976) Loss: 1.1425 (1.1976) 2024-02-13,00:05:29 | INFO | Start epoch 10 2024-02-13,00:05:35 | INFO | Train Epoch: 10 [ 1024/10010 (5%)] Data (t): 5.614 Batch (t): 5.988, 85.5099/s, 85.5099/s/gpu LR: 0.000009 Logit Scale: 100.000 Contrastive_loss: 0.92603 (0.92603) Loss: 0.92603 (0.92603) 2024-02-13,00:05:46 | INFO | Train Epoch: 10 [19456/10010 (100%)] Data (t): 0.089 Batch (t): 0.636, 866.623/s, 866.623/s/gpu LR: 0.000010 Logit Scale: 100.000 Contrastive_loss: 1.0557 (0.99089) Loss: 1.0557 (0.99089) 2024-02-13,00:05:47 | INFO | Start epoch 11 2024-02-13,00:05:54 | INFO | Train Epoch: 11 [ 1024/10010 (5%)] Data (t): 6.593 Batch (t): 6.953, 73.6390/s, 73.6390/s/gpu LR: 0.000010 Logit Scale: 100.000 Contrastive_loss: 0.75542 (0.75542) Loss: 0.75542 (0.75542) 2024-02-13,00:06:05 | INFO | Train Epoch: 11 [19456/10010 (100%)] Data (t): 0.045 Batch (t): 0.595, 865.436/s, 865.436/s/gpu LR: 0.000011 Logit Scale: 100.000 Contrastive_loss: 0.74945 (0.75243) Loss: 0.74945 (0.75243) 2024-02-13,00:06:05 | INFO | Start epoch 12 2024-02-13,00:06:11 | INFO | Train Epoch: 12 [ 1024/10010 (5%)] Data (t): 5.893 Batch (t): 6.266, 81.7047/s, 81.7047/s/gpu LR: 0.000011 Logit Scale: 100.000 Contrastive_loss: 0.60686 (0.60686) Loss: 0.60686 (0.60686) 2024-02-13,00:06:22 | INFO | Train Epoch: 12 [19456/10010 (100%)] Data (t): 0.015 Batch (t): 0.587, 865.351/s, 865.351/s/gpu LR: 0.000012 Logit Scale: 100.000 Contrastive_loss: 0.62050 (0.61368) Loss: 0.62050 (0.61368) 2024-02-13,00:06:22 | INFO | Start epoch 13 2024-02-13,00:06:30 | INFO | Train Epoch: 13 [ 1024/10010 (5%)] Data (t): 6.973 Batch (t): 7.323, 69.9169/s, 69.9169/s/gpu LR: 0.000012 Logit Scale: 100.000 Contrastive_loss: 0.49629 (0.49629) Loss: 0.49629 (0.49629) 2024-02-13,00:06:41 | INFO | Train Epoch: 13 [19456/10010 (100%)] Data (t): 0.044 Batch (t): 0.595, 874.122/s, 874.122/s/gpu LR: 0.000013 Logit Scale: 100.000 Contrastive_loss: 0.53294 (0.51462) Loss: 0.53294 (0.51462) 2024-02-13,00:06:41 | INFO | Start epoch 14 2024-02-13,00:06:48 | INFO | Train Epoch: 14 [ 1024/10010 (5%)] Data (t): 6.511 Batch (t): 6.872, 74.5086/s, 74.5086/s/gpu LR: 0.000013 Logit Scale: 100.000 Contrastive_loss: 0.45596 (0.45596) Loss: 0.45596 (0.45596) 2024-02-13,00:06:59 | INFO | Train Epoch: 14 [19456/10010 (100%)] Data (t): 0.014 Batch (t): 0.587, 865.782/s, 865.782/s/gpu LR: 0.000014 Logit Scale: 100.000 Contrastive_loss: 0.41602 (0.43599) Loss: 0.41602 (0.43599)