Nexusflow/Starling-RM-34B · Bug with example code:

Mar 23, 2024

•

edited Mar 23, 2024

Thanks, it looks like an excellent model.
But, I face a few difficulties running it.

The name/path needs to be modified, to Nexusflow + adding, "use_auth_token=True" as it is private:
reward_model = LlamaForSequenceClassification.from_pretrained("berkeley-nest/Starling-RM-34B",torch_dtype=torch.bfloat16)
->
reward_model = LlamaForSequenceClassification.from_pretrained("Nexusflow/Starling-RM-34B", torch_dtype=torch.bfloat16, use_auth_token=True)
But after loading it still gets into the next bug, saying there is some mismatch in model weights:

"RuntimeError: Error(s) in loading state_dict for LlamaForSequenceClassification:
size mismatch for transformer.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 7168]) from checkpoint, the shape in current model is
torch.Size([7168, 7168])."
...

I await your response,
Best,
Asaf

evan-nexusflow

Nexusflow org Apr 3, 2024

Hi @Asaf-Yehudai ,

Thanks for the bug catch on (1).

Are you still experiencing the issue in (2)? I have not seen this on my end.

Asaf-Yehudai

Apr 4, 2024

Sure,
Not sure what didn't work, but eventually was managed to run it on GPU, with some small modifications.

reward_model = LlamaForSequenceClassification.from_pretrained("Nexusflow/Starling-RM-34B", torch_dtype=torch.bfloat16, device_map='auto', use_auth_token=True)
reward_tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-34B-Chat")
reward_tokenizer.truncation_side = "left"
reward_model.eval().requires_grad_(False)

Define the reward function

reward_batch_size = 1

def get_reward(samples):
"""samples: List[str]"""
# Query the device on which the model is
model_device = reward_model.get_device()

input_ids = []
attention_masks = []
encodings_dict = reward_tokenizer(
    samples,
    truncation=True,
    max_length=2048,
    padding="max_length",
    return_tensors="pt",
)
# Move the tensors to the same device as the model
input_ids = encodings_dict["input_ids"].to(model_device)
attention_masks = encodings_dict["attention_mask"].to(model_device)

mbs = reward_batch_size
out = []
for i in range(math.ceil(len(samples) / mbs)):
    rewards = reward_model(input_ids=input_ids[i * mbs : (i + 1) * mbs], attention_mask=attention_masks[i * mbs : (i + 1) * mbs])
    out.extend(rewards["scores"])
return torch.hstack(out)