Commit
•
ebd28bb
1
Parent(s):
c1a8f5c
Update README.md
Browse files
README.md
CHANGED
@@ -60,7 +60,7 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
60 |
|
61 |
<!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
62 |
- Training regime: Mixed precision training using bf16
|
63 |
-
- Number of epochs:
|
64 |
- Learning rate: 1e-6
|
65 |
- Batch size: 16
|
66 |
- Seq length: 512
|
@@ -75,6 +75,25 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
75 |
- Intel Gaudi 2 AI Accelerator
|
76 |
- Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
|
77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
#### Software
|
79 |
- Pytorch
|
80 |
- Transformers library
|
|
|
60 |
|
61 |
<!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
62 |
- Training regime: Mixed precision training using bf16
|
63 |
+
- Number of epochs: 27
|
64 |
- Learning rate: 1e-6
|
65 |
- Batch size: 16
|
66 |
- Seq length: 512
|
|
|
75 |
- Intel Gaudi 2 AI Accelerator
|
76 |
- Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
|
77 |
|
78 |
+
|
79 |
+
#### Hardware utilization
|
80 |
+
##### Training
|
81 |
+
max_memory_allocated (GB)94.62
|
82 |
+
memory_allocated (GB)67.67
|
83 |
+
total_memory_available (GB)94.62
|
84 |
+
train_loss1.321901714310941
|
85 |
+
train_runtime9741.6819
|
86 |
+
train_samples_per_second15.877
|
87 |
+
train_steps_per_second0.995
|
88 |
+
|
89 |
+
##### Inference
|
90 |
+
Throughput (including tokenization) = 102.3085449650079 tokens/second
|
91 |
+
Number of HPU graphs = 18
|
92 |
+
Memory allocated = 15.37 GB
|
93 |
+
Max memory allocated = 15.39 GB
|
94 |
+
Total memory available = 94.62 GB
|
95 |
+
Graph compilation duration = 9.98630401911214 seconds
|
96 |
+
|
97 |
#### Software
|
98 |
- Pytorch
|
99 |
- Transformers library
|