mnoukhov commited on
Commit
00f4e8e
Β·
verified Β·
1 Parent(s): 1904ee8

Model save

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/run-a805570edd5eddcfb7deb3a583d7b4c2.wandb filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: EleutherAI/pythia-410m-deduped
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: pythia410m-sft-tldr
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # pythia410m-sft-tldr
15
+
16
+ This model is a fine-tuned version of [EleutherAI/pythia-410m-deduped](https://huggingface.co/EleutherAI/pythia-410m-deduped) on an unknown dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 2.5805
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 3e-06
38
+ - train_batch_size: 32
39
+ - eval_batch_size: 8
40
+ - seed: 42
41
+ - distributed_type: multi-GPU
42
+ - gradient_accumulation_steps: 4
43
+ - total_train_batch_size: 128
44
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: cosine
46
+ - num_epochs: 1.0
47
+
48
+ ### Training results
49
+
50
+ | Training Loss | Epoch | Step | Validation Loss |
51
+ |:-------------:|:-----:|:----:|:---------------:|
52
+ | 2.7228 | 0.2 | 183 | 2.6403 |
53
+ | 2.6244 | 0.4 | 366 | 2.6018 |
54
+ | 2.5986 | 0.6 | 549 | 2.5865 |
55
+ | 2.5808 | 0.8 | 732 | 2.5805 |
56
+
57
+
58
+ ### Framework versions
59
+
60
+ - Transformers 4.38.2
61
+ - Pytorch 2.1.2+cu121
62
+ - Datasets 2.17.0
63
+ - Tokenizers 0.15.2
code/wandb/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/files/config.yaml CHANGED
@@ -81,6 +81,26 @@ _wandb:
81
  5: 1
82
  6:
83
  - 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  return_dict:
85
  desc: null
86
  value: true
 
81
  5: 1
82
  6:
83
  - 1
84
+ - 1: train/train_runtime
85
+ 5: 1
86
+ 6:
87
+ - 1
88
+ - 1: train/train_samples_per_second
89
+ 5: 1
90
+ 6:
91
+ - 1
92
+ - 1: train/train_steps_per_second
93
+ 5: 1
94
+ 6:
95
+ - 1
96
+ - 1: train/total_flos
97
+ 5: 1
98
+ 6:
99
+ - 1
100
+ - 1: train/train_loss
101
+ 5: 1
102
+ 6:
103
+ - 1
104
  return_dict:
105
  desc: null
106
  value: true
code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/files/output.log CHANGED
@@ -580,3 +580,482 @@
580
 
581
 
582
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
580
 
581
 
582
 
583
+
584
+
585
+
586
+
587
+
588
+
589
+
590
+
591
+
592
+
593
+
594
+
595
+
596
+
597
+
598
+
599
+
600
+
601
+
602
+
603
+
604
+
605
+
606
+
607
+
608
+
609
+
610
+
611
+
612
+
613
+
614
+
615
+ 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 549/912 [25:58<15:40, 2.59s/it]
616
+
617
+
618
+
619
+
620
+
621
+
622
+
623
+
624
+
625
+
626
+
627
+
628
+
629
+
630
+
631
+
632
+
633
+
634
+
635
+
636
+
637
+
638
+
639
+
640
+
641
+
642
+
643
+
644
+ 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 793/806 [00:57<00:00, 13.76it/s]
645
+
646
+
647
+
648
+
649
+
650
+
651
+
652
+
653
+
654
+
655
+
656
+
657
+
658
+
659
+
660
+
661
+
662
+
663
+
664
+
665
+
666
+
667
+
668
+
669
+
670
+
671
+
672
+
673
+
674
+
675
+
676
+
677
+
678
+
679
+
680
+
681
+
682
+
683
+
684
+
685
+
686
+
687
+
688
+
689
+
690
+
691
+
692
+
693
+
694
+
695
+
696
+ 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 599/912 [29:06<13:31, 2.59s/it]
697
+
698
+
699
+
700
+
701
+
702
+
703
+
704
+
705
+
706
+
707
+
708
+
709
+
710
+
711
+
712
+
713
+
714
+
715
+
716
+
717
+
718
+
719
+
720
+
721
+
722
+
723
+
724
+
725
+
726
+
727
+
728
+
729
+
730
+
731
+
732
+
733
+
734
+
735
+
736
+
737
+
738
+
739
+
740
+
741
+
742
+
743
+
744
+
745
+
746
+
747
+
748
+
749
+
750
+
751
+
752
+
753
+
754
+
755
+
756
+
757
+
758
+
759
+
760
+
761
+
762
+
763
+
764
+
765
+
766
+
767
+
768
+
769
+
770
+
771
+
772
+
773
+
774
+
775
+
776
+
777
+
778
+
779
+
780
+
781
+
782
+
783
+
784
+
785
+
786
+
787
+
788
+
789
+
790
+
791
+
792
+
793
+
794
+
795
+
796
+
797
+ 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 699/912 [33:26<09:12, 2.59s/it]
798
+
799
+
800
+
801
+
802
+
803
+
804
+
805
+
806
+
807
+
808
+
809
+
810
+
811
+
812
+
813
+
814
+
815
+
816
+
817
+
818
+
819
+
820
+
821
+
822
+
823
+
824
+
825
+
826
+
827
+
828
+
829
+
830
+ 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 732/912 [34:51<07:50, 2.61s/it]
831
+
832
+
833
+
834
+
835
+
836
+
837
+
838
+
839
+
840
+
841
+
842
+
843
+
844
+
845
+
846
+
847
+
848
+
849
+
850
+
851
+
852
+
853
+
854
+
855
+
856
+
857
+
858
+
859
+ 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 793/806 [00:57<00:00, 13.76it/s]
860
+
861
+
862
+
863
+
864
+
865
+
866
+
867
+
868
+
869
+
870
+
871
+
872
+
873
+
874
+
875
+
876
+
877
+
878
+
879
+
880
+
881
+
882
+
883
+
884
+
885
+
886
+
887
+
888
+
889
+
890
+
891
+
892
+
893
+
894
+
895
+
896
+
897
+
898
+
899
+
900
+
901
+
902
+
903
+
904
+
905
+
906
+
907
+
908
+
909
+
910
+
911
+
912
+
913
+
914
+
915
+
916
+
917
+
918
+
919
+
920
+
921
+
922
+
923
+
924
+
925
+
926
+
927
+
928
+ 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 799/912 [38:42<04:55, 2.61s/it]
929
+
930
+
931
+
932
+
933
+
934
+
935
+
936
+
937
+
938
+
939
+
940
+
941
+
942
+
943
+
944
+
945
+
946
+
947
+
948
+
949
+
950
+
951
+
952
+
953
+
954
+
955
+
956
+
957
+
958
+
959
+
960
+
961
+
962
+
963
+
964
+
965
+
966
+
967
+
968
+
969
+
970
+
971
+
972
+
973
+
974
+
975
+
976
+
977
+
978
+
979
+
980
+
981
+
982
+
983
+
984
+
985
+
986
+
987
+
988
+
989
+
990
+
991
+
992
+
993
+
994
+
995
+
996
+
997
+
998
+
999
+
1000
+
1001
+
1002
+
1003
+
1004
+
1005
+
1006
+
1007
+
1008
+
1009
+
1010
+
1011
+
1012
+
1013
+
1014
+
1015
+
1016
+
1017
+
1018
+
1019
+
1020
+
1021
+
1022
+
1023
+
1024
+
1025
+
1026
+
1027
+
1028
+
1029
+ 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 899/912 [43:00<00:33, 2.55s/it]
1030
+
1031
+
1032
+
1033
+
1034
+
1035
+
1036
+
1037
+
1038
+
1039
+
1040
+
1041
+
1042
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 911/912 [43:31<00:02, 2.55s/it]
1043
+
1044
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 912/912 [43:33<00:00, 2.87s/it]
1045
+ run-a805570edd5eddcfb7deb3a583d7b4c2.wandb: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.45M/1.45M [00:00<00:00, 6.26MB/s]
1046
+ Upload 2 LFS files: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:00<00:00, 1.91it/s]05kB/s]
1047
+
1048
+
1049
+
1050
+
1051
+
1052
+
1053
+
1054
+
1055
+
1056
+
1057
+
1058
+
1059
+
1060
+
1061
+
code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/files/wandb-summary.json CHANGED
@@ -1 +1 @@
1
- {"train/loss": 2.5986, "train/grad_norm": 5.898989677429199, "train/learning_rate": 1.2735173894411445e-06, "train/epoch": 0.55, "train/global_step": 500, "_timestamp": 1715277332.9971147, "_runtime": 1416.2355046272278, "_step": 6, "eval/loss": 2.601827383041382, "eval/runtime": 58.2033, "eval/samples_per_second": 110.767, "eval/steps_per_second": 13.848}
 
1
+ {"train/loss": 2.5779, "train/grad_norm": 3.9158174991607666, "train/learning_rate": 1.2813624190484708e-09, "train/epoch": 1.0, "train/global_step": 912, "_timestamp": 1715278532.1392817, "_runtime": 2615.3776717185974, "_step": 13, "eval/loss": 2.5804877281188965, "eval/runtime": 58.1953, "eval/samples_per_second": 110.782, "eval/steps_per_second": 13.85, "train/train_runtime": 2617.6291, "train/train_samples_per_second": 44.591, "train/train_steps_per_second": 0.348, "train/total_flos": 1.3292726803182387e+17, "train/train_loss": 2.614081516600492}
code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/logs/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/run-a805570edd5eddcfb7deb3a583d7b4c2.wandb CHANGED
Binary files a/code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/run-a805570edd5eddcfb7deb3a583d7b4c2.wandb and b/code/wandb/run-20240509_173156-a805570edd5eddcfb7deb3a583d7b4c2/run-a805570edd5eddcfb7deb3a583d7b4c2.wandb differ
 
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.38.2"
6
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:39a41ac637b2dfbe095630e36ca10e98d887b503fc8ccf23318b7be0ad4db577
3
  size 1621370224
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c55280af0a3145842985599424628918297ee98f6d663db70b42df86aa2de775
3
  size 1621370224