OferB commited on
Commit
841df46
1 Parent(s): c00a8fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -150,10 +150,14 @@ Below are DeciCoder's pass@1 on MultiPL HumanEval scores
150
 
151
  ### Runtime Benchmarks
152
 
153
- |Inference Tool/Hardware | A10G (tokens/sec) | A100 (tokens/sec) |
154
- |:----------|:----------|:----------|
155
- | HF Inference Endpoints | 1,364.2 | 3,244.4 |
156
- | Infery LLM | 3,889.3 | 11,676.8 |
 
 
 
 
157
 
158
  ## Documentation
159
 
 
150
 
151
  ### Runtime Benchmarks
152
 
153
+ |Inference Tool/Hardware | A10 (tokens/sec) | A10 Latency (ms)| A100 (tokens/sec) | A100 Latency (ms) |
154
+ |:----------|:----------|:----------|:----------|:----------|
155
+ | HF Inference Endpoints | 1,364.2 | 9.03 | 3,244.4 | 8.8 |
156
+ | Infery LLM | 3,889.3 | 3.075 | 11,676.8 | 1.729 |
157
+
158
+ >**NOTE:**
159
+ >- Latency - Total generation time of batch size 1 (prefill+generate)
160
+ >- Throughput (tokens/sec) - Measured with optimal batchsize per hardware - A10 on BS 128, A100 on BS 512
161
 
162
  ## Documentation
163