Prompt and performance

by Sciumo - opened Jun 2, 2023

Jun 2, 2023

It was unclear what the prompt should be. Shouldn't model cards have the associated prompts?
Here is what I used:
template = """{instruct}
USER: {question}
ASSISTANT:
"""
The performance was 446.87 ms per token on a TR Pro 3995 with 64 cores and 256 GB RAM. I classify that as slow.
Apparently CPU doesn't really help, with a single NUMA just spins on memory access. I'm going to try https://github.com/huggingface/text-generation-inference

Here is the utilization:

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment