youval commited on
Commit
5bff551
·
1 Parent(s): 80c4451

model card update with fp16 info

Browse files
Files changed (1) hide show
  1. README.md +19 -8
README.md CHANGED
@@ -34,22 +34,33 @@ The model was trained and tested in the following languages:
34
 
35
  ## Inference Time
36
 
37
- | GPU Info | Batch size 1 | Batch size 32 |
38
- |:--------------------------------------------------------------|---------------:|---------------:|
39
- | NVIDIA A10 | 4 ms | 84 ms |
40
- | NVIDIA T4 | 15 ms | 362 ms |
 
 
41
 
42
  **Note that the Answer Finder models are only used at query time.**
43
 
44
- ## Requirements
45
 
46
- - Minimal Sinequa version: 11.10.0
47
- - GPU memory usage: 1060 MiB
 
 
 
 
48
 
49
- Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
50
  size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
51
  can be around 0.5 to 1 GiB depending on the used GPU.
52
 
 
 
 
 
 
53
  ## Model Details
54
 
55
  ### Overview
 
34
 
35
  ## Inference Time
36
 
37
+ | GPU Info | Quantization type | Batch size 1 | Batch size 32 |
38
+ |:------------------------------------------|-------------------|---------------:|---------------:|
39
+ | NVIDIA A10 | FP16 | 2 ms | 30 ms |
40
+ | NVIDIA A10 | FP32 | 4 ms | 84 ms |
41
+ | NVIDIA T4 | FP16 | 3 ms | 65 ms |
42
+ | NVIDIA T4 | FP32 | 15 ms | 362 ms |
43
 
44
  **Note that the Answer Finder models are only used at query time.**
45
 
46
+ ## GPU Memory usage
47
 
48
+ | GPU Info | Quantization type | Memory |
49
+ |:------------------------------------------|-------------------|-----------:|
50
+ | NVIDIA A10 | FP16 | 578 MiB |
51
+ | NVIDIA A10 | FP32 | 1062 MiB |
52
+ | NVIDIA T4 | FP16 | 547 MiB |
53
+ | NVIDIA T4 | FP32 | 1060 MiB |
54
 
55
+ Note that GPU memory usage only includes how much GPU memory the actual model consumes those specific GPUs with a batch
56
  size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
57
  can be around 0.5 to 1 GiB depending on the used GPU.
58
 
59
+ ## Requirements
60
+
61
+ - Minimal Sinequa version: 11.10.0
62
+ - [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
63
+
64
  ## Model Details
65
 
66
  ### Overview