model card update with fp16 info
Browse files
README.md
CHANGED
@@ -34,22 +34,33 @@ The model was trained and tested in the following languages:
|
|
34 |
|
35 |
## Inference Time
|
36 |
|
37 |
-
| GPU Info
|
38 |
-
|
39 |
-
| NVIDIA A10
|
40 |
-
| NVIDIA
|
|
|
|
|
41 |
|
42 |
**Note that the Answer Finder models are only used at query time.**
|
43 |
|
44 |
-
##
|
45 |
|
46 |
-
|
47 |
-
|
|
|
|
|
|
|
|
|
48 |
|
49 |
-
Note that GPU memory usage only includes how much GPU memory the actual model consumes
|
50 |
size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
|
51 |
can be around 0.5 to 1 GiB depending on the used GPU.
|
52 |
|
|
|
|
|
|
|
|
|
|
|
53 |
## Model Details
|
54 |
|
55 |
### Overview
|
|
|
34 |
|
35 |
## Inference Time
|
36 |
|
37 |
+
| GPU Info | Quantization type | Batch size 1 | Batch size 32 |
|
38 |
+
|:------------------------------------------|-------------------|---------------:|---------------:|
|
39 |
+
| NVIDIA A10 | FP16 | 2 ms | 30 ms |
|
40 |
+
| NVIDIA A10 | FP32 | 4 ms | 84 ms |
|
41 |
+
| NVIDIA T4 | FP16 | 3 ms | 65 ms |
|
42 |
+
| NVIDIA T4 | FP32 | 15 ms | 362 ms |
|
43 |
|
44 |
**Note that the Answer Finder models are only used at query time.**
|
45 |
|
46 |
+
## GPU Memory usage
|
47 |
|
48 |
+
| GPU Info | Quantization type | Memory |
|
49 |
+
|:------------------------------------------|-------------------|-----------:|
|
50 |
+
| NVIDIA A10 | FP16 | 578 MiB |
|
51 |
+
| NVIDIA A10 | FP32 | 1062 MiB |
|
52 |
+
| NVIDIA T4 | FP16 | 547 MiB |
|
53 |
+
| NVIDIA T4 | FP32 | 1060 MiB |
|
54 |
|
55 |
+
Note that GPU memory usage only includes how much GPU memory the actual model consumes those specific GPUs with a batch
|
56 |
size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
|
57 |
can be around 0.5 to 1 GiB depending on the used GPU.
|
58 |
|
59 |
+
## Requirements
|
60 |
+
|
61 |
+
- Minimal Sinequa version: 11.10.0
|
62 |
+
- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
|
63 |
+
|
64 |
## Model Details
|
65 |
|
66 |
### Overview
|