---
base_model: mistralai/Mistral-7B-Instruct-v0.1
datasets:
- generator
- Anthropic/hh-rlhf
library_name: peft
license: apache-2.0
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: Mistral-7B-text-to-RLHF
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Mistral-7B-text-to-RLHF

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on the generator dataset [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf).
It achieves the following results on the evaluation set:

- Loss: 0.7952

## Model description

[Human-in-the-Loop Fine-tuning of Mistral-7B for Enhanced Text Generation and Text-to-SQL](https://medium.com/@frankmorales_91352/human-in-the-loop-fine-tuning-of-mistral-7b-for-enhanced-text-generation-and-text-to-sql-23b06738af42)


## Training data
[Full Code - Fine-Tunning with Supervised Fine-tuning (SFT) GITHUB](https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_RLHF.ipynb)

## Evaluation data

[Human-in-the-Loop Fine-tuning of Mistral-7B for Enhanced Text Generation and Text-to-SQL](https://medium.com/@frankmorales_91352/human-in-the-loop-fine-tuning-of-mistral-7b-for-enhanced-text-generation-and-text-to-sql-23b06738af42)

[Full Code GITHUB](https://github.com/frank-morales2020/MLxDL/blob/main/EVAL_RLHF.ipynb)
 
 
```pycL7bOWT3UO1lNBqMWxh8#scrollTo=YWT-g8NX_ezq

from accelerate import Accelerator
from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig

#Initialize the accelerator

accelerator = Accelerator()


#From my Hugging Face Repository

model_id = 'frankmorales2020/Mistral-7B-text-to-RLHF'

# BitsAndBytesConfig int-4 config (if used for your reward model)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load the reward model and tokenizer

model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    num_labels=1,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.padding_side = "right"

model.config.pad_token_id = tokenizer.pad_token_id 

# Test cases
test_cases = [
    ("What is the capital of France?", "Paris", "London"),
    ("Who painted the Mona Lisa?", "Leonardo da Vinci", "Michelangelo"),
    ("What is the largest planet in our solar system?", "Jupiter", "Mars"),
    ("What would you do if you saw someone drop their wallet?", "Pick it up and return it to them.", "Ignore it."),
    ("What color is the sky?", "Blue", "Green"),
    ("What is the chemical symbol for water?", "H2O", "CO2"),
    # Add more test cases here...
]

def evaluate_example(prompt, chosen, rejected):
    inputs = tokenizer(
        [f"{prompt} {chosen}", f"{prompt} {rejected}"],
        return_tensors="pt",
        padding=True,
    ).to(accelerator.device)
    outputs = model(**inputs)
    chosen_score = outputs.logits[0].item()
    rejected_score = outputs.logits[1].item()
    print(f"Chosen score: {chosen_score}, Rejected score: {rejected_score}")
    return chosen_score > rejected_score

correct_predictions = 0
total_reciprocal_rank = 0

for i, (prompt, chosen, rejected) in enumerate(test_cases):
    print("\n")
    print(f"Prompt: {prompt}, Chosen: {chosen}, Rejected: {rejected}")
    print("\n")
    if evaluate_example(prompt, chosen, rejected):
        print("Test passed!")
        correct_predictions += 1
        total_reciprocal_rank += 1
    else:
        print("Test failed.")
        total_reciprocal_rank += 0  # Incorrect prediction

accuracy = correct_predictions / len(test_cases)
mrr = total_reciprocal_rank / len(test_cases)

print(f"\nOverall accuracy: {accuracy:.2f}")
print(f"Mean Reciprocal Rank (MRR): {mrr:.2f}")

```

```py

Prompt: What is the capital of France?, Chosen: Paris, Rejected: London


Chosen score: 3.890625, Rejected score: -15.375
Test passed!


Prompt: Who painted the Mona Lisa?, Chosen: Leonardo da Vinci, Rejected: Michelangelo


Chosen score: 6.0625, Rejected score: 4.1875
Test passed!


Prompt: What is the largest planet in our solar system?, Chosen: Jupiter, Rejected: Mars


Chosen score: 10.6875, Rejected score: 10.0625
Test passed!


Prompt: What would you do if you saw someone drop their wallet?, Chosen: Pick it up and return it to them., Rejected: Ignore it.


Chosen score: 3.140625, Rejected score: 0.13671875
Test passed!


Prompt: What color is the sky?, Chosen: Blue, Rejected: Green


Chosen score: 11.0625, Rejected score: 4.46875
Test passed!


Prompt: What is the chemical symbol for water?, Chosen: H2O, Rejected: CO2


Chosen score: 0.42578125, Rejected score: -0.68359375
Test passed!

Overall accuracy: 1.00
Mean Reciprocal Rank (MRR): 1.00
Number of questions used for MRR calculation: 6

```

## Training procedure

https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_RLHF.ipynb


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 3
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 6
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 1.7876        | 1.0   | 507  | 0.9024          |
| 1.0272        | 2.0   | 1014 | 0.7952          |
| 0.638         | 3.0   | 1521 | 0.8579          |


### Framework versions

- PEFT 0.13.2
- Transformers 4.46.1
- Pytorch 2.5.0+cu121
- Datasets 3.0.2
- Tokenizers 0.20.1