Llama responses are broken during conversation

by gusakovskyi - opened

Hello, I have used a llama locally with Fast Chat and also with Replicate API, and always at some moment during conversation is borkes, like:

  • Respond with infinite quotes("""""""""""""""....)
  • Repeating some tokens (youyouyouyouyouyouyou... )
  • responds with only first tokens (I AM) and nothing more.
  • In scope of one response stops generate readable text and returns something senseless

Here is an exmple:

Here was a question about the history of USA and at some point it starts to return some strange text

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B"

pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"


why carsh and not give response?

