petermcaughan commited on
Commit
92d47fb
1 Parent(s): 7353d4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -30,7 +30,7 @@ See the [usage instructions](#usage-example) for how to inference this model wit
30
 
31
  #### Latency for token generation
32
 
33
- Below is average latency of generating a token using a prompt of varying size using NVIDIA A100-SXM4-80GB GPU:
34
 
35
  | Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
36
  |-------------|------------|----------------|-------------------|
 
30
 
31
  #### Latency for token generation
32
 
33
+ Below is average latency of generating a token using a prompt of varying size using NVIDIA A100-SXM4-80GB GPU, taken from the [ORT benchmarking script for Mistral](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/llama/README.md#benchmark-mistral)
34
 
35
  | Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
36
  |-------------|------------|----------------|-------------------|