Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -16,14 +16,20 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
16 |
|
17 |
# Training Procedure
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
```
|
29 |
best_model = torch.load(weights_path)
|
@@ -35,12 +41,14 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
35 |
```
|
36 |
|
37 |
|
38 |
-
|
39 |
-
|
40 |
1. Train the model in 1 shot instead of two different phases
|
41 |
2. Keep a better batch size (Basically earn more money and buy a good GPU)
|
42 |
3. Data transformation also plays a vital role here
|
43 |
4. OneCycle LR range needs to be appropriately modified for a better LR
|
|
|
|
|
44 |
|
45 |
# Data Transformation
|
46 |
|
@@ -51,83 +59,74 @@ Along with the transforms mentioned in the [config file](https://github.com/deep
|
|
51 |
# Accuracy Report
|
52 |
|
53 |
```
|
54 |
-
Class accuracy is:
|
55 |
-
No obj accuracy is:
|
56 |
-
Obj accuracy is:
|
57 |
|
58 |
-
MAP: 0.
|
59 |
|
60 |
```
|
61 |
|
62 |
-
# [Training Logs](https://github.com/deepanshudashora/ERAV1/blob/master/session13/lightning_version/
|
63 |
|
64 |
#### For faster execution we run the validation step after 20 epochs for the first 20 epochs of training and after that after every 5 epochs till 40 epochs
|
65 |
|
66 |
```
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
|
121 |
```
|
122 |
|
123 |
# Results
|
124 |
|
125 |
-
##
|
126 |
-
![train_logs_1.png](images/
|
127 |
-
|
128 |
-
## From 19 to 20
|
129 |
-
![train_logs_2.png](images/train_logs_2.png)
|
130 |
-
|
131 |
-
## Full training logs for loss
|
132 |
-
|
133 |
-
![full_training.png](images/full_training.png)
|
|
|
16 |
|
17 |
# Training Procedure
|
18 |
|
19 |
+
#### [Experiment 1](https://github.com/deepanshudashora/ERAV1/tree/master/session13/lightning_version/Experiments)
|
20 |
+
1. The model is trained on Tesla T4 (15GB GPU memory)
|
21 |
+
2. The training is completed in two phases
|
22 |
+
3. The first phase contains 20 epochs and the second phase contains another 20 epochs
|
23 |
+
4. In the first training we see loss dropping correctly but in the second training it drops less
|
24 |
+
5. We run our two training loops separately and do not run any kind of validation on them, except for validation loss
|
25 |
+
|
26 |
+
#### [Experiment 2](https://github.com/deepanshudashora/ERAV1/tree/master/session13/lightning_version)
|
27 |
+
1. The model is trained on 2 Tesla t4 GPUs, with distributed training using PyTorch lightning
|
28 |
+
2. For doing the distributed training we use the strategy ```ddp_notebook_find_unused_parameters_true```
|
29 |
+
|
30 |
+
* Later we evaluate the model and get the numbers
|
31 |
+
* The lightning generally saves the model as .ckpt format, so we convert it to torch format by saving state dict as .pt format
|
32 |
+
* For doing this we use these two lines of code
|
33 |
|
34 |
```
|
35 |
best_model = torch.load(weights_path)
|
|
|
41 |
```
|
42 |
|
43 |
|
44 |
+
* The model starts overfitting on the dataset after 30 epochs
|
45 |
+
* Future Improvements
|
46 |
1. Train the model in 1 shot instead of two different phases
|
47 |
2. Keep a better batch size (Basically earn more money and buy a good GPU)
|
48 |
3. Data transformation also plays a vital role here
|
49 |
4. OneCycle LR range needs to be appropriately modified for a better LR
|
50 |
+
|
51 |
+
|
52 |
|
53 |
# Data Transformation
|
54 |
|
|
|
59 |
# Accuracy Report
|
60 |
|
61 |
```
|
62 |
+
Class accuracy is: 85.015236%
|
63 |
+
No obj accuracy is: 98.522491%
|
64 |
+
Obj accuracy is: 65.760597%
|
65 |
|
66 |
+
MAP: 0.4661380648612976
|
67 |
|
68 |
```
|
69 |
|
70 |
+
# [Training Logs](https://github.com/deepanshudashora/ERAV1/blob/master/session13/lightning_version/training_logs/csv_training_logs/lightning_logs/version_0/metrics.csv)
|
71 |
|
72 |
#### For faster execution we run the validation step after 20 epochs for the first 20 epochs of training and after that after every 5 epochs till 40 epochs
|
73 |
|
74 |
```
|
75 |
+
lr-Adam step train_loss epoch val_loss
|
76 |
+
786 NaN 19499 4.653981 37.0 NaN
|
77 |
+
787 0.000160 19549 NaN NaN NaN
|
78 |
+
788 NaN 19549 4.864988 37.0 NaN
|
79 |
+
789 0.000160 19599 NaN NaN NaN
|
80 |
+
790 NaN 19599 5.241925 37.0 NaN
|
81 |
+
791 0.000160 19649 NaN NaN NaN
|
82 |
+
792 NaN 19649 5.020171 37.0 NaN
|
83 |
+
793 0.000161 19699 NaN NaN NaN
|
84 |
+
794 NaN 19699 4.245292 38.0 NaN
|
85 |
+
795 0.000161 19749 NaN NaN NaN
|
86 |
+
796 NaN 19749 4.541957 38.0 NaN
|
87 |
+
797 0.000161 19799 NaN NaN NaN
|
88 |
+
798 NaN 19799 3.837740 38.0 NaN
|
89 |
+
799 0.000161 19849 NaN NaN NaN
|
90 |
+
800 NaN 19849 4.239679 38.0 NaN
|
91 |
+
801 0.000161 19899 NaN NaN NaN
|
92 |
+
802 NaN 19899 4.034101 38.0 NaN
|
93 |
+
803 0.000161 19949 NaN NaN NaN
|
94 |
+
804 NaN 19949 5.010788 38.0 NaN
|
95 |
+
805 0.000161 19999 NaN NaN NaN
|
96 |
+
806 NaN 19999 3.980245 38.0 NaN
|
97 |
+
807 0.000161 20049 NaN NaN NaN
|
98 |
+
808 NaN 20049 4.641729 38.0 NaN
|
99 |
+
809 0.000161 20099 NaN NaN NaN
|
100 |
+
810 NaN 20099 4.563717 38.0 NaN
|
101 |
+
811 0.000161 20149 NaN NaN NaN
|
102 |
+
812 NaN 20149 4.422552 38.0 NaN
|
103 |
+
813 0.000161 20199 NaN NaN NaN
|
104 |
+
814 NaN 20199 4.925357 38.0 NaN
|
105 |
+
815 0.000161 20249 NaN NaN NaN
|
106 |
+
816 NaN 20249 4.788391 39.0 NaN
|
107 |
+
817 0.000161 20299 NaN NaN NaN
|
108 |
+
818 NaN 20299 4.478580 39.0 NaN
|
109 |
+
819 0.000161 20349 NaN NaN NaN
|
110 |
+
820 NaN 20349 4.624731 39.0 NaN
|
111 |
+
821 0.000161 20399 NaN NaN NaN
|
112 |
+
822 NaN 20399 4.425498 39.0 NaN
|
113 |
+
823 0.000161 20449 NaN NaN NaN
|
114 |
+
824 NaN 20449 4.361921 39.0 NaN
|
115 |
+
825 0.000161 20499 NaN NaN NaN
|
116 |
+
826 NaN 20499 4.318252 39.0 NaN
|
117 |
+
827 0.000161 20549 NaN NaN NaN
|
118 |
+
828 NaN 20549 4.013813 39.0 NaN
|
119 |
+
829 0.000161 20599 NaN NaN NaN
|
120 |
+
830 NaN 20599 4.476331 39.0 NaN
|
121 |
+
831 0.000161 20649 NaN NaN NaN
|
122 |
+
832 NaN 20649 4.192605 39.0 NaN
|
123 |
+
833 0.000161 20699 NaN NaN NaN
|
124 |
+
834 NaN 20699 4.065756 39.0 NaN
|
125 |
+
835 NaN 20719 NaN 39.0 4.348697
|
|
|
|
|
126 |
|
127 |
```
|
128 |
|
129 |
# Results
|
130 |
|
131 |
+
## Loss Curve
|
132 |
+
![train_logs_1.png](images/accuracy_curve.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|