LLama 3 has a big slowdown when using grammar

#110
by ThomasOfOz - opened

I've found that LLama 3 runs about 5 to 10 times slower if you include a grammar statement.
EG If you include this in the check.
"grammar":" root ::= fullanswer \n fullanswer ::= "John: " answer \nanswer ::= sentence | "<|im_end|>" | sentence "\n"\nsentence ::= [a-zA-Z0-9.,?!' ]*\n"

I did the same comparison on LLama 2. One with grammar and one without. The 2 runs on LLama 2 were roughly the same.
I was running this through Koboldcpp submitting a request as json to it, and just removing this line massively increased it's speed.

Why LLama 3 giving late response?

I noticed the same problem. Only other thing I could find that might be talking about this is this github discussion:
https://github.com/abetlen/llama-cpp-python/discussions/1376

Sign up or log in to comment