deveshreddy27
/

Gaudi_Guanco_Llama2_FT

Inference Endpoints

Model card Files Files and versions Community

deveshreddy27 commited on about 16 hours ago

Commit

ebd28bb

•

1 Parent(s): c1a8f5c

Update README.md

Files changed (1) hide show

README.md +20 -1

README.md CHANGED Viewed

@@ -60,7 +60,7 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 - Training regime: Mixed precision training using bf16
-- Number of epochs: 18
 - Learning rate: 1e-6
 - Batch size: 16
 - Seq length: 512
@@ -75,6 +75,25 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 - Intel Gaudi 2 AI Accelerator
 - Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
 #### Software
 - Pytorch
 - Transformers library

 <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 - Training regime: Mixed precision training using bf16
+- Number of epochs: 27
 - Learning rate: 1e-6
 - Batch size: 16
 - Seq length: 512
 - Intel Gaudi 2 AI Accelerator
 - Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
+#### Hardware utilization
+##### Training
+max_memory_allocated (GB)94.62
+memory_allocated (GB)67.67
+total_memory_available (GB)94.62
+train_loss1.321901714310941
+train_runtime9741.6819
+train_samples_per_second15.877
+train_steps_per_second0.995
+##### Inference
+Throughput (including tokenization) = 102.3085449650079 tokens/second
+Number of HPU graphs                = 18
+Memory allocated                    = 15.37 GB
+Max memory allocated                = 15.39 GB
+Total memory available              = 94.62 GB
+Graph compilation duration          = 9.98630401911214 seconds
 #### Software
 - Pytorch
 - Transformers library