--- license: mit datasets: - openai/webgpt_comparisons - openai/summarize_from_feedback - Anthropic/hh-rlhf language: - en --- # Reward model on deberta-v2-xxlarge (1.5B) Reward model used in RLHF which is trained on webgpt, summarize from human feedback and Open Assistant user ranked dataset # Model Details ## Model Description - **Developed by:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ## Model Sources [optional] - **Repository:** [Open Assistant](https://github.com/LAION-AI/Open-Assistant) - **Paper :** [Instruct GPT](https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf) : We try to replicate as close as we can on our hardware and existing datasets - **Demo [optional]:** [More Information Needed] # Uses This model was trained with human feedback comparison examples, which penalize bad or rude sentence with lower scores. ## Direct Use ``` model_name = 'theblackcat102/deberta-v2-xxlarge-rm' model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "I just got out of prison, any suggestion?" good_helpful = "I am sorry to hear about it, it must be a hard time inside" bad_text = "Stay away from me, you scumbag convict" pos = tokenizer(prompt, good_helpful, return_tensors='pt') neg = tokenizer(prompt, bad_text, return_tensors='pt') pos_score = model(**pos).logits[0] neg_score = model(**neg).logits[0] print(pos_score, neg_score) >> tensor([-1.3449], grad_fn=) tensor([-2.0942], grad_fn=) ``` ## Downstream Use [optional] [More Information Needed] ## Out-of-Scope Use [More Information Needed] # Bias, Risks, and Limitations [More Information Needed] ## Recommendations How to use it as a rank function ```python def divide_chunks(l, n): # looping till length l for i in range(0, len(l), n): yield l[i:i + n] @torch.no_grad() def rank_model_fn(samples, **kwargs): output_scores = [] for chunk_samples in divide_chunks(samples, 16): is_empty = [] prefixes, postfixes = [], [] for sample in chunk_samples: prefix, postfix = sample.split('[SEP]') postfix = postfix.strip() if len(postfix) == 0 or len(set(postfix)) <= 3: is_empty.append(True) else: is_empty.append(False) postfixes.append(postfix) prefixes.append(prefix) is_empty = np.array(is_empty) inputs = rank_tokenizer(prefixes, postfixes, return_tensors="pt", padding=True) inputs.pop("token_type_ids", None) inputs = { key: tensor.cuda() for key, tensor in inputs.items() } scores = rank_model(**inputs).logits[:, 0].detach().cpu() scores[is_empty] = -4 output_scores += [ s for s in scores ] return torch.from_numpy(np.array(output_scores)) ``` ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] # Training Details ## Training Procedure checkout our training repo [here](https://github.com/LAION-AI/Open-Assistant/tree/main/model/reward/instructor) ### Preprocessing [optional] [More Information Needed] ### Training Hyperparameters ```yaml model_name: microsoft/deberta-v2-xxlarge learning_rate: 2e-6 scheduler: cosine gradient_checkpointing: false gradient_accumulation_steps: 12 per_device_train_batch_size: 1 per_device_eval_batch_size: 4 warmup_steps: 600 eval_steps: 1000000 save_steps: 1000 max_length: 512 num_train_epochs: 2 datasets: - webgpt - hfsummary - anthropic_rlhf - oa_private ``` ### Speeds, Sizes, Times [optional] Trained on 8 A100 80G model, since we are using the same batch strategy as InstructGPT, using a batch_size of 1 actually equals to (N-1) batch where N refers to number of negative examples. Which is why I recommend using the largest VRAM GPU you can find to train this model. # Evaluation ## Testing Data, Factors & Metrics ### Testing Data [More Information Needed] ### Factors [More Information Needed] ### Metrics [More Information Needed] ## Results [More Information Needed] ### Summary # Model Examination [optional] [More Information Needed] # Technical Specifications [optional] ## Model Architecture and Objective [More Information Needed] ## Compute Infrastructure [More Information Needed] ### Hardware [More Information Needed] ### Software [More Information Needed] # Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] # Glossary [optional] [More Information Needed] # More Information [optional] [More Information Needed] # Model Card Authors [optional] [More Information Needed] # Model Card Contact [More Information Needed]