Duplicated responses and <|endoftext|> markers.

#14

by FriendlyVisage - opened May 10, 2023

May 10, 2023

Finally got a working colab. However, I'm getting duplicated answers. For example:

generate(""""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Who was the first president of the United States?


### Response:""")

It will respond with:

George Washington.
<|endoftext|>
George Washington.
<|endoftext|>

Is that expected, and if so, what is the right way to handle it? I could just keep everything that appears before the first <|endoftext|> marker, but that seems like a bit of a hack.

FriendlyVisage

May 10, 2023

Never mind. I was printing it twice.

FriendlyVisage changed discussion status to closed May 10, 2023

souvik0306

May 19, 2023

Finally got a working colab. However, I'm getting duplicated answers. For example:
generate(""""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Who was the first president of the United States?


### Response:""")
It will respond with:
George Washington.
<|endoftext|>
George Washington.
<|endoftext|>
Is that expected, and if so, what is the right way to handle it? I could just keep everything that appears before the first <|endoftext|> marker, but that seems like a bit of a hack.

Hi, I was trying to build a Colab Implementation and was running into errors, can you please share your Notebook?

FriendlyVisage

May 19, 2023

@souvik0306 Here is a youtube video about using it and he has a link to his colab in the description, which is much better than my cobbled together crap. A couple things. The triton/flash attention takes a long time to build and for some reason, loading the model into memory is super slow.

https://www.youtube.com/watch?v=DXpk9K7DgMo&t=950s

FriendlyVisage

May 20, 2023

@souvik0306 I found a colab that works better for MPT 7B Instruct. I had to modify it a little. I put it in a git repo here: https://github.com/curtisshipley/llm-info

souvik0306

May 21, 2023

I followed your colab notebook and it worked fine till the point I tried to load the model -

# Initialize the model and tokenizer
generate = InstructionTextGenerationPipeline(
    "mosaicml/mpt-7b-instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
stop_token_ids = generate.tokenizer.convert_tokens_to_ids(["<|endoftext|>"])


# Define a custom stopping criteria
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_id in stop_token_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

To mention I tried to execute this on the Free Version of Colab as well as on a Linux system with -
RAM - 24 GB
NVIDIA GeForce RTX 4090
PyTorch Cuda 11.7

FriendlyVisage

May 21, 2023

It's hard to say what the problem is, without seeing the exact error. I have colab pro, and tested it on an A100.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment