HuggingFaceH4/starchat-beta · Incomplete Output even with max_new

Sep 7, 2023

So the output of my model ends abruptly and I ideally want it to complete the paragraph/sentences/code which it was it between of.
Although I have provided max_new_tokens = 300 and also in prompt I give to limit by 300 words.

The response is always big and ends abruptly. Any way I can ask for a complete output within desired number of output tokens?

Code:

checkpoint = "HuggingFaceH4/starchat-beta"
device = "cuda" if torch.cuda.is_available() else "cpu" 
class StarCoderModel:
  def __init__(self):
    print("Running in " + device)
    self.tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    self.model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map='auto')

  def infer(self, input_text, token_count):
    inputs = self.tokenizer.encode(input_text, return_tensors="pt").to(device)  
    outputs = self.model.generate(inputs,  max_new_tokens=token_count, pad_token_id=self.tokenizer.eos_token_id)
    return self.tokenizer.decode(outputs[0])[len(input_text):]

Sample:

private DataType FuntionName(String someId) {
    // TODO: Replace with implementation that utilizes someId to obtain information
    return DataType.Value;
}


The comment:

- If someId is present in the code, use the getAPI from Client with someId as a parameter to obtain some information.
- If the