Duplicated responses and <|endoftext|> markers.

#14
by FriendlyVisage - opened

Finally got a working colab. However, I'm getting duplicated answers. For example:

generate(""""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Who was the first president of the United States?


### Response:""")

It will respond with:

George Washington.
<|endoftext|>
George Washington.
<|endoftext|>

Is that expected, and if so, what is the right way to handle it? I could just keep everything that appears before the first <|endoftext|> marker, but that seems like a bit of a hack.

Never mind. I was printing it twice.

FriendlyVisage changed discussion status to closed

Finally got a working colab. However, I'm getting duplicated answers. For example:

generate(""""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Who was the first president of the United States?


### Response:""")

It will respond with:

George Washington.
<|endoftext|>
George Washington.
<|endoftext|>

Is that expected, and if so, what is the right way to handle it? I could just keep everything that appears before the first <|endoftext|> marker, but that seems like a bit of a hack.

Hi, I was trying to build a Colab Implementation and was running into errors, can you please share your Notebook?

@souvik0306 Here is a youtube video about using it and he has a link to his colab in the description, which is much better than my cobbled together crap. A couple things. The triton/flash attention takes a long time to build and for some reason, loading the model into memory is super slow.

https://www.youtube.com/watch?v=DXpk9K7DgMo&t=950s

@souvik0306 I found a colab that works better for MPT 7B Instruct. I had to modify it a little. I put it in a git repo here: https://github.com/curtisshipley/llm-info

I followed your colab notebook and it worked fine till the point I tried to load the model -

# Initialize the model and tokenizer
generate = InstructionTextGenerationPipeline(
    "mosaicml/mpt-7b-instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
stop_token_ids = generate.tokenizer.convert_tokens_to_ids(["<|endoftext|>"])


# Define a custom stopping criteria
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_id in stop_token_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

To mention I tried to execute this on the Free Version of Colab as well as on a Linux system with -
RAM - 24 GB
NVIDIA GeForce RTX 4090
PyTorch Cuda 11.7

It's hard to say what the problem is, without seeing the exact error. I have colab pro, and tested it on an A100.

Sign up or log in to comment