Spaces:

wgetdd
/

YoloV3-PASCAL-VOC

Running

App Files Files Community

wgetdd commited on Aug 17, 2023

Commit

5bfda01

•

1 Parent(s): a451ab8

Update README.md

Browse files

Files changed (1) hide show

README.md +76 -77

README.md CHANGED Viewed

@@ -16,14 +16,20 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 # Training Procedure
-1. The model is trained on Tesla T4 (15GB GPU memory)
-2. The training is completed in two phases
-3. The first phase contains 20 epochs and second phase contains another 20 epochs
-4. In the first training we see loss dropping correctly but in the second training it drops less
-5. We run our two training loops separately and do not run any kind of validation on them, except for validation loss
-6. Later we evaluate the model and get the numbers
-7. The lightning generally saves the model as .ckpt format, so we convert it to torch format by saving state dict as .pt format
-8. For doing this we use these two lines of code
 ```
   best_model = torch.load(weights_path)
@@ -35,12 +41,14 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 ```
-8. The model starts overfitting on the dataset after 30 epochs
-9. Future Improvements
      1. Train the model in 1 shot instead of two different phases
      2. Keep a better batch size (Basically earn more money and buy a good GPU)
      3. Data transformation also plays a vital role here
      4. OneCycle LR range needs to be appropriately modified for a better LR
 # Data Transformation
@@ -51,83 +59,74 @@ Along with the transforms mentioned in the [config file](https://github.com/deep
 # Accuracy Report
 ```
-Class accuracy is: 82.999725%
-No obj accuracy is: 96.828300%
-Obj accuracy is: 76.898473%
-MAP: 0.29939851760864258
 ```
-# [Training Logs](https://github.com/deepanshudashora/ERAV1/blob/master/session13/lightning_version/merged_logs.csv)
 #### For faster execution we run the validation step after 20 epochs for the first 20 epochs of training and after that after every 5 epochs till 40 epochs
 ```
-      Unnamed: 0   lr-Adam    step  train_loss  epoch  val_loss
-6576        6576       NaN  164299    4.186745   39.0       NaN
-6577        6577  0.000132  164349         NaN    NaN       NaN
-6578        6578       NaN  164349    2.936086   39.0       NaN
-6579        6579  0.000132  164399         NaN    NaN       NaN
-6580        6580       NaN  164399    4.777130   39.0       NaN
-6581        6581  0.000132  164449         NaN    NaN       NaN
-6582        6582       NaN  164449    3.139145   39.0       NaN
-6583        6583  0.000132  164499         NaN    NaN       NaN
-6584        6584       NaN  164499    4.596097   39.0       NaN
-6585        6585  0.000132  164549         NaN    NaN       NaN
-6586        6586       NaN  164549    5.587294   39.0       NaN
-6587        6587  0.000132  164599         NaN    NaN       NaN
-6588        6588       NaN  164599    4.592830   39.0       NaN
-6589        6589  0.000132  164649         NaN    NaN       NaN
-6590        6590       NaN  164649    3.914468   39.0       NaN
-6591        6591  0.000132  164699         NaN    NaN       NaN
-6592        6592       NaN  164699    3.180615   39.0       NaN
-6593        6593  0.000132  164749         NaN    NaN       NaN
-6594        6594       NaN  164749    5.772174   39.0       NaN
-6595        6595  0.000132  164799         NaN    NaN       NaN
-6596        6596       NaN  164799    2.894014   39.0       NaN
-6597        6597  0.000132  164849         NaN    NaN       NaN
-6598        6598       NaN  164849    4.473828   39.0       NaN
-6599        6599  0.000132  164899         NaN    NaN       NaN
-6600        6600       NaN  164899    6.397766   39.0       NaN
-6601        6601  0.000132  164949         NaN    NaN       NaN
-6602        6602       NaN  164949    3.789242   39.0       NaN
-6603        6603  0.000132  164999         NaN    NaN       NaN
-6604        6604       NaN  164999    5.182691   39.0       NaN
-6605        6605  0.000132  165049         NaN    NaN       NaN
-6606        6606       NaN  165049    4.845749   39.0       NaN
-6607        6607  0.000132  165099         NaN    NaN       NaN
-6608        6608       NaN  165099    3.672542   39.0       NaN
-6609        6609  0.000132  165149         NaN    NaN       NaN
-6610        6610       NaN  165149    4.230726   39.0       NaN
-6611        6611  0.000132  165199         NaN    NaN       NaN
-6612        6612       NaN  165199    4.625024   39.0       NaN
-6613        6613  0.000132  165249         NaN    NaN       NaN
-6614        6614       NaN  165249    4.549682   39.0       NaN
-6615        6615  0.000132  165299         NaN    NaN       NaN
-6616        6616       NaN  165299    4.040627   39.0       NaN
-6617        6617  0.000132  165349         NaN    NaN       NaN
-6618        6618       NaN  165349    4.857126   39.0       NaN
-6619        6619  0.000132  165399         NaN    NaN       NaN
-6620        6620       NaN  165399    3.081895   39.0       NaN
-6621        6621  0.000132  165449         NaN    NaN       NaN
-6622        6622       NaN  165449    3.945353   39.0       NaN
-6623        6623  0.000132  165499         NaN    NaN       NaN
-6624        6624       NaN  165499    3.203420   39.0       NaN
-6625        6625       NaN  165519         NaN   39.0  3.081895
 ```
 # Results
-## For epochs 0 to 19
-![train_logs_1.png](images/train_logs_1.png)
-## From 19 to 20
-![train_logs_2.png](images/train_logs_2.png)
-## Full training logs for loss
-![full_training.png](images/full_training.png)

 # Training Procedure
+#### [Experiment 1](https://github.com/deepanshudashora/ERAV1/tree/master/session13/lightning_version/Experiments)
+   1. The model is trained on Tesla T4 (15GB GPU memory)
+   2. The training is completed in two phases
+   3. The first phase contains 20 epochs and the second phase contains another 20 epochs
+   4. In the first training we see loss dropping correctly but in the second training it drops less
+   5. We run our two training loops separately and do not run any kind of validation on them, except for validation loss
+#### [Experiment 2](https://github.com/deepanshudashora/ERAV1/tree/master/session13/lightning_version)
+   1. The model is trained on 2 Tesla t4 GPUs, with distributed training using PyTorch lightning
+   2. For doing the distributed training we use the strategy ```ddp_notebook_find_unused_parameters_true```
+* Later we evaluate the model and get the numbers
+* The lightning generally saves the model as .ckpt format, so we convert it to torch format by saving state dict as .pt format
+* For doing this we use these two lines of code
 ```
   best_model = torch.load(weights_path)
 ```
+* The model starts overfitting on the dataset after 30 epochs
+* Future Improvements
      1. Train the model in 1 shot instead of two different phases
      2. Keep a better batch size (Basically earn more money and buy a good GPU)
      3. Data transformation also plays a vital role here
      4. OneCycle LR range needs to be appropriately modified for a better LR
 # Data Transformation
 # Accuracy Report
 ```
+Class accuracy is: 85.015236%
+No obj accuracy is: 98.522491%
+Obj accuracy is: 65.760597%
+MAP: 0.4661380648612976
 ```
+# [Training Logs](https://github.com/deepanshudashora/ERAV1/blob/master/session13/lightning_version/training_logs/csv_training_logs/lightning_logs/version_0/metrics.csv)
 #### For faster execution we run the validation step after 20 epochs for the first 20 epochs of training and after that after every 5 epochs till 40 epochs
 ```
+         lr-Adam   step  train_loss  epoch  val_loss
+   786       NaN  19499    4.653981   37.0       NaN
+   787  0.000160  19549         NaN    NaN       NaN
+   788       NaN  19549    4.864988   37.0       NaN
+   789  0.000160  19599         NaN    NaN       NaN
+   790       NaN  19599    5.241925   37.0       NaN
+   791  0.000160  19649         NaN    NaN       NaN
+   792       NaN  19649    5.020171   37.0       NaN
+   793  0.000161  19699         NaN    NaN       NaN
+   794       NaN  19699    4.245292   38.0       NaN
+   795  0.000161  19749         NaN    NaN       NaN
+   796       NaN  19749    4.541957   38.0       NaN
+   797  0.000161  19799         NaN    NaN       NaN
+   798       NaN  19799    3.837740   38.0       NaN
+   799  0.000161  19849         NaN    NaN       NaN
+   800       NaN  19849    4.239679   38.0       NaN
+   801  0.000161  19899         NaN    NaN       NaN
+   802       NaN  19899    4.034101   38.0       NaN
+   803  0.000161  19949         NaN    NaN       NaN
+   804       NaN  19949    5.010788   38.0       NaN
+   805  0.000161  19999         NaN    NaN       NaN
+   806       NaN  19999    3.980245   38.0       NaN
+   807  0.000161  20049         NaN    NaN       NaN
+   808       NaN  20049    4.641729   38.0       NaN
+   809  0.000161  20099         NaN    NaN       NaN
+   810       NaN  20099    4.563717   38.0       NaN
+   811  0.000161  20149         NaN    NaN       NaN
+   812       NaN  20149    4.422552   38.0       NaN
+   813  0.000161  20199         NaN    NaN       NaN
+   814       NaN  20199    4.925357   38.0       NaN
+   815  0.000161  20249         NaN    NaN       NaN
+   816       NaN  20249    4.788391   39.0       NaN
+   817  0.000161  20299         NaN    NaN       NaN
+   818       NaN  20299    4.478580   39.0       NaN
+   819  0.000161  20349         NaN    NaN       NaN
+   820       NaN  20349    4.624731   39.0       NaN
+   821  0.000161  20399         NaN    NaN       NaN
+   822       NaN  20399    4.425498   39.0       NaN
+   823  0.000161  20449         NaN    NaN       NaN
+   824       NaN  20449    4.361921   39.0       NaN
+   825  0.000161  20499         NaN    NaN       NaN
+   826       NaN  20499    4.318252   39.0       NaN
+   827  0.000161  20549         NaN    NaN       NaN
+   828       NaN  20549    4.013813   39.0       NaN
+   829  0.000161  20599         NaN    NaN       NaN
+   830       NaN  20599    4.476331   39.0       NaN
+   831  0.000161  20649         NaN    NaN       NaN
+   832       NaN  20649    4.192605   39.0       NaN
+   833  0.000161  20699         NaN    NaN       NaN
+   834       NaN  20699    4.065756   39.0       NaN
+   835       NaN  20719         NaN   39.0  4.348697
 ```
 # Results
+## Loss Curve
+![train_logs_1.png](images/accuracy_curve.png)