TheBloke/Llama-2-70B-GPTQ · Problems with temperature when using with python code.

Jul 19, 2023

Hi, I am following the instructions to use in python code, but the model is always outputting the same response to the same prompt. Changing the temperature does not seem to do anything. What could be the issue here?

robinlowen

Jul 20, 2023

could you provide some code?

matchaslime

Jul 21, 2023

I'm just using the example code in the readme for the autogptq section

TheBloke

Owner Jul 21, 2023

When I have tested inference before, I have had code to change the seed, like so:

    @property
    def seed(self):
        return self._current_seed

    

@seed

	.setter
    def seed(self, seed):
        self._seed = int(seed)

    def update_seed(self):
        self._current_seed = (self._seed == -1 ) and random.randint(1, 2**31) or self._seed
        random.seed(self._current_seed)
        torch.manual_seed(self._current_seed)
        torch.cuda.manual_seed_all(self._current_seed)

    def generate(self, prompt):
        self.update_seed()
        input_ids, len_input_ids = self.encode(prompt)

        with self.do_timing(True) as timing:
            with torch.no_grad():
                tokens = self.model.generate(inputs=input_ids, generation_config=self.generation_config)[0].cuda()
            len_reply = len(tokens) - len_input_ids
            response = self.tokenizer.decode(tokens)
            reply_tokens = tokens[-len_reply:]
            reply = self.tokenizer.decode(reply_tokens)

        result = {
            'response': response,   # The response in full, including prompt
            'reply': reply,         # Just the reply, no prompt
            'len_reply': len_reply, # The length of the reply tokens
            'seed': self.seed,      # The seed used to generate this response
            'time': timing['time']  # The time in seconds to generate the response
        }
        return result

You could try the same to get a different seed for each generation.