Update model card with FP16 info

by youval - opened Feb 1

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+87

-67

youval

Feb 1

•

edited Feb 1

This is a proposition of a new format of model cards, that takes into account FP16 values for both inference speed and memory utilization.
We also show memory utilization for both T4 and A10 now.

loic-dagnas-sinequa

Sinequa org Feb 1

nothin gin here ?

model card update with fp16 info5bff5510

skirres

Sinequa org Feb 1

Nice ✅

loic-dagnas-sinequa

Sinequa org Feb 1

Love it

boris-f

Sinequa org Feb 1

Looks good! I know it's a draft but it looks like a "on" is missing in "consumes those specific GPUs" ? (Haven't managed to comment on the code sorry)

remove gpu type for memory usage5624ab98

youval

Feb 14

I updated the model-card with the following changes:

I decided to discard the GPU type in GPU memory footprint. Imo it simply adds redundant information. Also, the memory footprint is a very unstable value that may vary a lot (memory allocation is not linear, it is allocated in chunks which depend on many factors). The difference one may have on a specific GPU type is likely to be due to another factor than the GPU type. I think having only one value could mitigate the confusion.
I added the info that: in order to use an NVIDIA L4 with FP16 you must use Sinequa 11.11.0 or above.

adding nvidia l4 inference speed infoed4c7929

skirres

Sinequa org Feb 14

Not sure how precise we need to be, but you could stay more general with something like For FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0).

Otherwise perfect for me, thanks! ✅

being more general about minimal version with fp16 and gpus cc 8.9+44810143

youval changed pull request status to open Feb 19

youval changed pull request status to merged Feb 19

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment