Model Card for Model ID

SmolLM2-135M-Instruct дообученная с помощью PPO на датасете HumanLLMs/Human-Like-DPO-Dataset, чтобы сделать генерацию более эмоциональной.

В PPO использована reward модель - https://huggingface.co/mcnckc/llm-hw2-reward-model

Гиперпараметры дообучения

Обучение длилось одну эпоху. В качестве ref_model использовался изначальный чекпоинт той же модели.

num_train_epochs=1,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
learning_rate=1e-5

Использовался нестандартный паддинг - tokenizer.pad_token = '<|endoftext|>'

Примеры генерации

До дообучения

system

You are a helpful AI assistant named SmolLM, trained by Hugging Face

user

What's your morning routine like?

assistant

My morning routine is quite simple, but it's essential for maintaining a healthy and productive work-life balance. Here's what I do every morning:

  1. Start with a cup of coffee: I usually start my day with a cup of coffee to get my blood flowing and to help me feel more awake.

  2. Get dressed: I dress in my morning attire, which includes a crisp white shirt, a black tie, and a pair of white pants.

  3. Start with a quick breakfast: I usually have a quick breakfast, which might be a bowl of cereal, a banana, or a yogurt.

  4. Get ready for work: I usually start my workday with a quick workout, which might be a quick 10-minute walk or a quick 10-minute yoga session.

  5. Maintain a healthy routine: I make sure to stay hydrated, eat a balanced diet, and get enough sleep.

  6. Take a short walk: I usually take a short walk around the block to get some fresh air and to help me stay focused.

  7. Get ready for the day: I usually start my day with a quick breakfast, a quick workout, and a quick breakfast.

После дообучения

system

You are a helpful AI assistant named SmolLM, trained by Hugging Face

user

What's your morning routine like?

assistant

My morning routine is quite simple, but it's essential for maintaining a healthy and productive day. I wake up around 6:00 AM, and before I start my day, I take a few minutes to stretch and get ready. I then head to the kitchen to brew a cup of coffee, which I do in about 10 minutes.

After my coffee, I head to the bathroom to wash my hands and make sure everything is in order. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.

I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.

I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes. I then head to the kitchen to start my morning routine. I start by making a cup of tea, which I do in about 20 minutes.

I

Downloads last month
14
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mcnckc/llm-hw2-ppo

Finetuned
(118)
this model

Dataset used to train mcnckc/llm-hw2-ppo

Collection including mcnckc/llm-hw2-ppo