Prompt and performance

by Sciumo - opened

It was unclear what the prompt should be. Shouldn't model cards have the associated prompts?
Here is what I used:
template = """{instruct}
USER: {question}
The performance was 446.87 ms per token on a TR Pro 3995 with 64 cores and 256 GB RAM. I classify that as slow.
Apparently CPU doesn't really help, with a single NUMA just spins on memory access. I'm going to try


Here is the utilization:

Sign up or log in to comment