sinequa
/

answer-finder-v1-L-multilingual

Question Answering

Inference Endpoints

Model card Files Files and versions Community

youval commited on Feb 1, 2024

Commit

5bff551

·

1 Parent(s): 80c4451

model card update with fp16 info

Files changed (1) hide show

README.md +19 -8

README.md CHANGED Viewed

@@ -34,22 +34,33 @@ The model was trained and tested in the following languages:
 ## Inference Time
-| GPU Info                                                      |  Batch size 1  |  Batch size 32 |
-|:--------------------------------------------------------------|---------------:|---------------:|
-| NVIDIA A10                                                    |           4 ms |          84 ms |
-| NVIDIA T4                                                     |          15 ms |         362 ms |
 **Note that the Answer Finder models are only used at query time.**
-## Requirements
-- Minimal Sinequa version: 11.10.0
-- GPU memory usage: 1060 MiB
-Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
 size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
 can be around 0.5 to 1 GiB depending on the used GPU.
 ## Model Details
 ### Overview

 ## Inference Time
+| GPU Info                                  | Quantization type |  Batch size 1  |  Batch size 32 |
+|:------------------------------------------|-------------------|---------------:|---------------:|
+| NVIDIA A10                                | FP16              |           2 ms |          30 ms |
+| NVIDIA A10                                | FP32              |           4 ms |          84 ms |
+| NVIDIA T4                                 | FP16              |           3 ms |          65 ms |
+| NVIDIA T4                                 | FP32              |          15 ms |         362 ms |
 **Note that the Answer Finder models are only used at query time.**
+## GPU Memory usage
+| GPU Info                                  | Quantization type |   Memory   |
+|:------------------------------------------|-------------------|-----------:|
+| NVIDIA A10                                | FP16              |    578 MiB |
+| NVIDIA A10                                | FP32              |   1062 MiB |
+| NVIDIA T4                                 | FP16              |    547 MiB |
+| NVIDIA T4                                 | FP32              |   1060 MiB |
+Note that GPU memory usage only includes how much GPU memory the actual model consumes those specific GPUs with a batch
 size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
 can be around 0.5 to 1 GiB depending on the used GPU.
+## Requirements
+- Minimal Sinequa version: 11.10.0
+- [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
 ## Model Details
 ### Overview