Update model card with FP16 info
#1
by
youval
- opened
This is a proposition of a new format of model cards, that takes into account FP16 values for both inference speed and memory utilization.
We also show memory utilization for both T4 and A10 now.
nothin gin here ?
Nice ✅
Love it
Looks good! I know it's a draft but it looks like a "on" is missing in "consumes those specific GPUs" ? (Haven't managed to comment on the code sorry)
I updated the model-card with the following changes:
- I decided to discard the GPU type in GPU memory footprint. Imo it simply adds redundant information. Also, the memory footprint is a very unstable value that may vary a lot (memory allocation is not linear, it is allocated in chunks which depend on many factors). The difference one may have on a specific GPU type is likely to be due to another factor than the GPU type. I think having only one value could mitigate the confusion.
- I added the info that: in order to use an NVIDIA L4 with FP16 you must use Sinequa 11.11.0 or above.
Not sure how precise we need to be, but you could stay more general with something like For FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0)
.
Otherwise perfect for me, thanks! ✅
youval
changed pull request status to
open
youval
changed pull request status to
merged