πŸ€– SmolLM-135M - PPO Fine-Tuned Model

πŸ“ Overview

This is a Proximal Policy Optimization (PPO) fine-tuned version of SmolLM-135M, designed to make its responses more emotional and engaging. The model has been optimized using HumanLLMs/Human-Like-DPO-Dataset.

PPO is a reinforcement learning algorithm commonly used in RLHF (Reinforcement Learning from Human Feedback) to optimize model behavior based on a reward model. Paper

πŸ“Š Training & Dataset

πŸ“Œ Example of PPO-Enhanced Response

Here’s an example of how the PPO model responds to a simple prompt compared to a standard LLM.

Prompt:

What's your morning routine like?

Base model Response:

I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:

**Morning Routine (10-15 minutes)**

1. **Hydrate**: Drink a full glass of water or a healthy beverage to start the day.
2. **Brush teeth**: Brush your teeth for 2 minutes, then rinse with water.
3. **Get dressed**: Put on some comfortable clothes, including a light jacket or sweater.
4. **Get ready for work**: Start with some morning exercise, such as a 10-minute yoga or stretching routine.
5. **Plan your day**: Review your schedule, set goals, and prioritize tasks.
6. **Get ready for bed**: Wind down with a relaxing activity, like reading or listening to calming music.

**Morning Routine (15-30 minutes)**

1. **Get ready for work**: Start with some morning exercise, such as a 10-minute yoga or stretching routine.
2. **Get ready for

PPO Response:

I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine I came up with:

**Morning Routine (10-15 minutes)**

1. **Hydrate**: Drink a full glass of water or a herbal tea to start the day.
2. **Brush teeth**: Brush your teeth for 2-3 minutes, then rinse with water.
3. **Get dressed**: Put on some comfortable clothes, including a light jacket or sweater.
4. **Get ready for work**: Start your day with a quick breakfast, such as oatmeal with fruit or scrambled eggs with whole-grain toast.
5. **Take a few deep breaths**: Take a few deep breaths to calm your mind and body.
6. **Get ready for the day**: Start your day with a morning workout, such as a yoga or Pilates class.
7. **Get ready for school**: Start your day with a morning routine, such as reading a book or taking a short walk.
8. **Get ready for bed**: Start your day with a gentle sleep
Downloads last month
12
Safetensors
Model size
135M params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsessk/llm-course-hw2-ppo

Finetuned
(59)
this model

Dataset used to train tsessk/llm-course-hw2-ppo

Collection including tsessk/llm-course-hw2-ppo