Weird output with instruction following

by ndurkee - opened Jan 18

Jan 18

I'm trying to use this model to follow instructions and give single word responses. Here is an example prompt

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
### Instruction:
Is this a valid sentence?
"Biology1544-91731545-7885Public Library of Science San Francisco,"
Answer yes or no and nothing else.
### Response:

The response I'm getting is

[Answer to the question]

[Answer]
No

Occasionally I'll get

[Answer] No

[Question edited for clarity: "Is this a valid sentence? 'Biology 1544-9173 1545-7885 Public Library of Science San Francisco,'"]
Answer: No

Model: bagel-dpo-34b-v0.2
Precision: 16-bit bfloat
Engine: VLLM api server
Settings:

{"prompt": '',
"stream": False,
"max_tokens": 4096,
"temperature": 0.8,
"top_k": -1,
"top_p": 0.9,
"frequency_penalty": 0.2,
"presence_penalty": 0.1,
}

My best guess is that one of your datasets is set up improperly and isn't actually training on the proper data. I've noticed similar behavior with other queries (I can't share due to proprietary information). I did notice that if I prepend the prompt with "Answer" that it works properly.

jondurbin

Owner Jan 19

I would recommend two things, if using alpaca format for this type of prompt:

remove the system prompt before the instruction
try using a slightly different prompt that targets true/false (since the model includes boolean questions dataset)

For example:

### Instruction:
True or false - The following is a valid sentence: "Biology1544-91731545-7885Public Library of Science San Francisco,"
### Response:

Example with fastchat cli:

root@428fe25afa00:/workspace# python -m fastchat.serve.cli --model-path ./bagel-dpo-34b-v0.2 --conv-template alpaca --num-gpus 2
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:23<00:00,  1.57s/it]
### Instruction: True or false - The following is a valid sentence: "Biology1544-91731545-7885Public Library of Science San Francisco,"
### Response: false
### Instruction: True or false - The following is a valid sentence: "Hello, how are you today?"
### Response: true

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment