elswa-dev commited on
Commit
f4d6f51
·
verified ·
1 Parent(s): 1959180

Upload simple_grpo_fine_tune_course.ipynb

Browse files
Files changed (1) hide show
  1. simple_grpo_fine_tune_course.ipynb +534 -0
simple_grpo_fine_tune_course.ipynb ADDED
@@ -0,0 +1,534 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "source": [
6
+ "###Simple GRPO Fine-Tuning Course\n",
7
+ "In this simple example, we're going to fine-tune a large language model using GRPO (Group Relative Policy Optimization) from scratch.\n",
8
+ "\n",
9
+ "This notebook is part of the Simple GRPO Fine-Tuning Course, a free course designed for beginners, where you learn to fine-tune language models using the GRPO technique."
10
+ ],
11
+ "metadata": {
12
+ "id": "6S4koaD7fOLo"
13
+ }
14
+ },
15
+ {
16
+ "cell_type": "markdown",
17
+ "source": [
18
+ "![Easy Fine-Tune LLMs with GRPO - li_back.svg]()\n",
19
+ "\n",
20
+ "\n",
21
+ "<small>*Repository: [elswa-dev/notebook-courses](https://huggingface.co/elswa-dev/notebook-courses)*</small>"
22
+ ],
23
+ "metadata": {
24
+ "id": "pZ3PeFCfiE0R"
25
+ }
26
+ },
27
+ {
28
+ "cell_type": "markdown",
29
+ "source": [
30
+ "# **For this course you should already understand basic concepts in large language models, so here's a quick recap in very simple terms.**\n",
31
+ "\n",
32
+ "\n",
33
+ "**What is LLM Training (from scratch)?**\n",
34
+ "\n",
35
+ "\n",
36
+ "\n",
37
+ "Imagine you have a big, empty coloring book. Training an LLM from scratch is like filling in that entire book with colors—starting with a completely blank page. The model begins with no idea of language; it must learn everything by looking at a vast amount of text and figuring out the patterns, grammar, and meaning on its own. This process needs a lot of time, huge amounts of data, and significant computational power, much like teaching a child language from zero.\n",
38
+ "\n",
39
+ "**What is LLM Fine-Tuning?**\n",
40
+ "\n",
41
+ "\n",
42
+ "\n",
43
+ "Now imagine that you already have a coloring book where most of the pictures are nicely colored in, but you want to change a few specific images to better suit your taste. Fine-tuning is like that: you take a pre-trained model (one that already has a good grasp of language) and give it extra, focused lessons to improve its performance on a specific task or style. Instead of starting all over again, you’re just tweaking and refining what the model already knows.\n",
44
+ "\n",
45
+ "**The Key Difference**\n",
46
+ "\n",
47
+ "\n",
48
+ "The big difference is where you start. Training from scratch means beginning with nothing and building up all the knowledge from the ground up, while fine-tuning starts with an already knowledgeable model and customizes it for a particular purpose for example reasoning."
49
+ ],
50
+ "metadata": {
51
+ "id": "jE2wyM7eKaB3"
52
+ }
53
+ },
54
+ {
55
+ "cell_type": "markdown",
56
+ "source": [
57
+ "#Prerequisites\n",
58
+ "\n",
59
+ "**Before you begin, please ensure the following:**\n",
60
+ "\n",
61
+ "Hugging Face Account:\n",
62
+ "Create an account at [Hugging Face](https://colab.research.google.com/drive/16wuZjYRKrwd0l9J6zAGN62sHq2vfmKqZ#scrollTo=BOUVMas8C_ep&line=1&uniqifier=1) if you haven’t already.\n",
63
+ "\n",
64
+ "Once signed in, go to your account settings and generate a personal access token. This token is useful when you need to download models or datasets that require authentication.\n",
65
+ "\n",
66
+ "\n",
67
+ "Weights & Biases (W&B) Account:\n",
68
+ "Sign up at [Weights & Biases](https://wandb.ai/register). After registration, head over to your profile or account settings to obtain your personal API token. This token allows you to log your experiments and track training metrics through W&B."
69
+ ],
70
+ "metadata": {
71
+ "id": "BOUVMas8C_ep"
72
+ }
73
+ },
74
+ {
75
+ "cell_type": "markdown",
76
+ "metadata": {
77
+ "id": "dd4GYb6a0geP"
78
+ },
79
+ "source": [
80
+ "# Fine-tune LLMs with GRPO for e.g. reasoning\n",
81
+ "\n",
82
+ "how to finetune an LLM with GRPO, using the `trl` library.\n",
83
+ "\n",
84
+ "1. **Introduction**\n",
85
+ "* Imagine you want to teach a language model to write or answer questions better. We do that by starting with a “pre-trained” language model (kind of like a well-read student) and then fine tuning it so it becomes even smarter on a specific topic or style. In this course, we will use a method called GRPO (Group Relative Policy Optimization) that trains the model by comparing groups of its answers and rewarding the best ones, much like giving praise to the best drawings in a classroom.\n",
86
+ "\n",
87
+ "2. **Setting Up Your Google Colab Notebook**\n",
88
+ "\n",
89
+ "* Change the runtime to use a GPU for faster training (from the menu, click Runtime > Change runtime type and select GPU).\n",
90
+ "\n",
91
+ "\n",
92
+ "**Why This Is Important:**\n",
93
+ "\n",
94
+ "The notebook is like your digital notebook where you write instructions (code) and explanations. Using a GPU is similar to having a super fast computer helper for heavy tasks.\n",
95
+ "\n",
96
+ "3. **Install the Necessary Dependencies**\n",
97
+ "\n",
98
+ "* In the first code cell, install all the libraries needed to run the training. For our example, we need tools that help manage data, work with language models, efficiently fine tune with reinforcement learning, and even track our progress with charts.\n",
99
+ "\n",
100
+ "\n",
101
+ "Here’s a simple version of the installation cell:"
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "code",
106
+ "execution_count": null,
107
+ "metadata": {
108
+ "id": "l3IstgzN63QW"
109
+ },
110
+ "outputs": [],
111
+ "source": [
112
+ "!pip install -qqq datasets==3.2.0 transformers==4.47.1 trl==0.14.0 peft==0.14.0 accelerate==1.2.1 bitsandbytes==0.45.2 wandb==0.19.7 --progress-bar off\n",
113
+ "!pip install -qqq flash-attn --no-build-isolation --progress-bar off"
114
+ ]
115
+ },
116
+ {
117
+ "cell_type": "markdown",
118
+ "source": [
119
+ "— when installing dependencies you might see a message like:\n",
120
+ "\n",
121
+ "> ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.\n",
122
+ "\n",
123
+ "You can safely ignore it— the training and fine tuning will still work as expected.\n",
124
+ "\n",
125
+ "\n",
126
+ "**Why This Is Important:**\n",
127
+ "Each package is like a tool in our toolbox:\n",
128
+ "\n",
129
+ "* *datasets and transformers* help us work with text data and language models.\n",
130
+ "* *trl* (for reinforcement learning) and *peft* (for efficient training) help us customize the training process.\n",
131
+ "* *wandb* tracks what happens during training, so we can see progress and fix issues if needed."
132
+ ],
133
+ "metadata": {
134
+ "id": "vf4qs2LWnnQD"
135
+ }
136
+ },
137
+ {
138
+ "cell_type": "markdown",
139
+ "metadata": {
140
+ "id": "Q9MjbDWR0geT"
141
+ },
142
+ "source": [
143
+ "## Load Dataset"
144
+ ]
145
+ },
146
+ {
147
+ "cell_type": "markdown",
148
+ "source": [
149
+ "**What You Do:**\n",
150
+ "\n",
151
+ "Load a collection of examples (our dataset) from Hugging Face. In our case, the dataset has paired “prompts” (questions or instructions) and “completions” (answers or responses).\n",
152
+ "\n",
153
+ "Example code:"
154
+ ],
155
+ "metadata": {
156
+ "id": "7FPu2cpACZr-"
157
+ }
158
+ },
159
+ {
160
+ "cell_type": "code",
161
+ "execution_count": null,
162
+ "metadata": {
163
+ "collapsed": true,
164
+ "id": "5Y-X13wB7UP4"
165
+ },
166
+ "outputs": [],
167
+ "source": [
168
+ "import torch\n",
169
+ "import wandb\n",
170
+ "from datasets import load_dataset\n",
171
+ "from peft import LoraConfig, get_peft_model\n",
172
+ "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
173
+ "from trl import GRPOConfig, GRPOTrainer\n",
174
+ "\n",
175
+ "# Log to Weights & Biases\n",
176
+ "wandb.login()\n",
177
+ "\n",
178
+ "# Load dataset\n",
179
+ "dataset = load_dataset(\"mlabonne/smoltldr\")\n",
180
+ "print(dataset)"
181
+ ]
182
+ },
183
+ {
184
+ "cell_type": "markdown",
185
+ "source": [
186
+ "**Why This Is Important:**\n",
187
+ "Think of the dataset like a set of flash cards. The model learns from these examples so that when it gets a new question, it knows what kinds of answers look good."
188
+ ],
189
+ "metadata": {
190
+ "id": "y4HqxagjqkI6"
191
+ }
192
+ },
193
+ {
194
+ "cell_type": "markdown",
195
+ "metadata": {
196
+ "id": "Y1tlrHXB0geU"
197
+ },
198
+ "source": [
199
+ "## Load Model"
200
+ ]
201
+ },
202
+ {
203
+ "cell_type": "markdown",
204
+ "source": [
205
+ "**What to do:**\n",
206
+ "\n",
207
+ "Load a pre-trained language model and its tokenizer. The tokenizer helps convert words into numbers that the model understands. We then add a special “LoRA” module to make the fine tuning process lighter and faster.\n",
208
+ "\n",
209
+ "Example code:"
210
+ ],
211
+ "metadata": {
212
+ "id": "69fISWv99oBX"
213
+ }
214
+ },
215
+ {
216
+ "cell_type": "code",
217
+ "execution_count": null,
218
+ "metadata": {
219
+ "id": "3tLRvi5i-Qls"
220
+ },
221
+ "outputs": [],
222
+ "source": [
223
+ "# Load model\n",
224
+ "model_id = \"HuggingFaceTB/SmolLM-135M-Instruct\"\n",
225
+ "model = AutoModelForCausalLM.from_pretrained(\n",
226
+ " model_id,\n",
227
+ " torch_dtype=\"auto\",\n",
228
+ " device_map=\"auto\",\n",
229
+ " attn_implementation=\"flash_attention_2\",\n",
230
+ ")\n",
231
+ "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
232
+ "\n",
233
+ "# Load LoRA\n",
234
+ "lora_config = LoraConfig(\n",
235
+ " task_type=\"CAUSAL_LM\",\n",
236
+ " r=16,\n",
237
+ " lora_alpha=32,\n",
238
+ " target_modules=\"all-linear\",\n",
239
+ ")\n",
240
+ "model = get_peft_model(model, lora_config)\n",
241
+ "print(model.print_trainable_parameters())"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "markdown",
246
+ "source": [
247
+ "**Why This Is Important:**\n",
248
+ "\n",
249
+ "This step loads an already good “student” (the model) and prepares it to learn a little more without re-learning everything from scratch. The LoRA module acts like a shortcut to quickly tune only the important parts."
250
+ ],
251
+ "metadata": {
252
+ "id": "6spl1KEOq3GI"
253
+ }
254
+ },
255
+ {
256
+ "cell_type": "markdown",
257
+ "metadata": {
258
+ "id": "3P4Ww0-f0geV"
259
+ },
260
+ "source": [
261
+ "## Define Reward Function\n",
262
+ "\n",
263
+ "**What to do:**\n",
264
+ "\n",
265
+ "Set up a reward function. In reinforcement learning, the computer is given a score (reward) based on how good its answer is. For simplicity, we use a function that rewards answers that are about 50 characters long."
266
+ ]
267
+ },
268
+ {
269
+ "cell_type": "code",
270
+ "execution_count": null,
271
+ "metadata": {
272
+ "id": "745L0RC6-XBT"
273
+ },
274
+ "outputs": [],
275
+ "source": [
276
+ "# Reward function\n",
277
+ "def reward_len(completions, **kwargs):\n",
278
+ " return [-abs(50 - len(completion)) for completion in completions]"
279
+ ]
280
+ },
281
+ {
282
+ "cell_type": "markdown",
283
+ "source": [
284
+ "**Why This Is Important:**\n",
285
+ "\n",
286
+ "Imagine you’re playing a game where you get points based on how close you are to a target number. This function acts as the “score keeper” that tells the model whether it did well or needs more practice."
287
+ ],
288
+ "metadata": {
289
+ "id": "Kv3O4TrBrX7i"
290
+ }
291
+ },
292
+ {
293
+ "cell_type": "markdown",
294
+ "metadata": {
295
+ "id": "jFggiY3M0geX"
296
+ },
297
+ "source": [
298
+ "## Define Training Arguments\n",
299
+ "\n",
300
+ "**What You Do:**\n",
301
+ "\n",
302
+ "Set the parameters for training the model. This includes where to save your progress, how many examples to process at once, and how many times to go through the dataset.\n",
303
+ "Here’s a plain-English explanation of some key points:\n",
304
+ "1. output_dir: Where the computer saves its work (checkpoints).\n",
305
+ "2. learning_rate: How quickly the model adjusts its thinking. A small number means the model learns slowly and carefully.\n",
306
+ "3. per_device_train_batch_size: How many examples the model sees at once before making adjustments.\n",
307
+ "4. gradient_accumulation_steps: How many steps to combine so the model takes a smarter update (like saving multiple drafts before finalizing).\n",
308
+ "5. num_train_epochs: How many times the model goes through the entire set of examples.\n"
309
+ ]
310
+ },
311
+ {
312
+ "cell_type": "markdown",
313
+ "source": [
314
+ "## Fine Tune the Model with GRPOTrainer\n",
315
+ "\n",
316
+ "**What You Do:**\n",
317
+ "\n",
318
+ "Combine everything using a GRPO trainer. GRPO (Group Relative Policy Optimization) fine tunes the model by comparing groups of different answers and rewarding the best ones. We also use Weights & Biases (wandb) to track the training progress.\n",
319
+ "\n",
320
+ "A simplified version of the training cell might look like this:"
321
+ ],
322
+ "metadata": {
323
+ "id": "mQMYQSV5unk_"
324
+ }
325
+ },
326
+ {
327
+ "cell_type": "code",
328
+ "execution_count": null,
329
+ "metadata": {
330
+ "id": "NtiMrN480geX"
331
+ },
332
+ "outputs": [],
333
+ "source": [
334
+ "# Training arguments\n",
335
+ "training_args = GRPOConfig(\n",
336
+ " output_dir=\"GRPO\",\n",
337
+ " learning_rate=2e-5,\n",
338
+ " per_device_train_batch_size=8,\n",
339
+ " gradient_accumulation_steps=2,\n",
340
+ " max_prompt_length=512,\n",
341
+ " max_completion_length=96,\n",
342
+ " num_generations=8,\n",
343
+ " optim=\"adamw_8bit\",\n",
344
+ " num_train_epochs=1,\n",
345
+ " bf16=True,\n",
346
+ " report_to=[\"wandb\"],\n",
347
+ " remove_unused_columns=False,\n",
348
+ " logging_steps=1,\n",
349
+ ")\n",
350
+ "\n",
351
+ "# Trainer\n",
352
+ "trainer = GRPOTrainer(\n",
353
+ " model=model,\n",
354
+ " reward_funcs=[reward_len],\n",
355
+ " args=training_args,\n",
356
+ " train_dataset=dataset[\"train\"],\n",
357
+ ")\n",
358
+ "\n",
359
+ "# Train model\n",
360
+ "wandb.init(project=\"GRPO\")\n",
361
+ "trainer.train()"
362
+ ]
363
+ },
364
+ {
365
+ "cell_type": "markdown",
366
+ "source": [
367
+ "**Why This Is Important:**\n",
368
+ "\n",
369
+ "These configurations are like setting the rules and pace for studying. They ensure the model learns steadily without overwhelming the computer.\n",
370
+ "\n",
371
+ "The trainer goes over each example, checks the group of generated answers, and uses the reward function to figure out which answers are best. With every step, the model “learns” from its mistakes—much like practicing math problems and improving over time"
372
+ ],
373
+ "metadata": {
374
+ "id": "ILhtAr8-sL67"
375
+ }
376
+ },
377
+ {
378
+ "cell_type": "markdown",
379
+ "metadata": {
380
+ "id": "JUqHI2Ah0geX"
381
+ },
382
+ "source": [
383
+ "## Wrap-Up & Push Model to Hub\n",
384
+ "\n",
385
+ "After training, you can save your fine tuned model so that later you (or your colleagues) can use it to generate better answers. A code cell for saving might look like this: Where you need to give your fine-tuned model an name or enhance the existing as i did with ...*reason-we*"
386
+ ]
387
+ },
388
+ {
389
+ "cell_type": "code",
390
+ "execution_count": null,
391
+ "metadata": {
392
+ "id": "oKHhpA4z-sRF"
393
+ },
394
+ "outputs": [],
395
+ "source": [
396
+ "# Save model\n",
397
+ "merged_model = trainer.model.merge_and_unload()\n",
398
+ "merged_model.push_to_hub(\"HuggingFaceTB/SmolLM-135M-Instruct-reason-we\", private=False)"
399
+ ]
400
+ },
401
+ {
402
+ "cell_type": "markdown",
403
+ "source": [
404
+ "**Why This Is Important:**\n",
405
+ "\n",
406
+ "Saving the model is like keeping your best project in a folder—you can share it, build upon it, and use it to answer real-world questions."
407
+ ],
408
+ "metadata": {
409
+ "id": "KzVobAX0tNsw"
410
+ }
411
+ },
412
+ {
413
+ "cell_type": "markdown",
414
+ "metadata": {
415
+ "id": "oJcrL4z60geY"
416
+ },
417
+ "source": [
418
+ "## Generate Text\n",
419
+ "\n",
420
+ "In the final section, you use your newly fine tuned language model to produce written responses from text prompts. Once the model has learned during training, this section shows you how to \"ask\" the model for a response and then translate the model’s internal numerical output back into readable text. In simple terms, it’s the part where you put your model to the test by giving it a sentence or question and letting it complete or respond.\n",
421
+ "\n",
422
+ "**How It Works**\n",
423
+ "\n",
424
+ "Imagine your model is like a child who has just finished learning from lots of flash cards. Now, you give the child a prompt—like “Tell me a story about a brave knight”—and ask them to make up a story. The Generate Text section is where you let the model “speak” by running its built-in text-generation function. The steps typically include:\n",
425
+ "\n",
426
+ "* Preparing the Prompt: You write a prompt (the question or sentence you want the model to complete).\n",
427
+ "* Tokenizing the Prompt: The model doesn’t understand plain text; it turns your words into numbers using a process called tokenization.\n",
428
+ "\n",
429
+ "* Generating Output: The model then uses its learned knowledge to predict what token (number) comes next repeatedly until it creates a complete response.\n",
430
+ "* Decoding the Output: Finally, the sequence of numbers is converted back into human-readable text.\n",
431
+ "\n",
432
+ "\n",
433
+ "What the Code Looks Like\n",
434
+ "An example code snippet in a Google Colab notebook for the Generate Text section might be:"
435
+ ]
436
+ },
437
+ {
438
+ "cell_type": "code",
439
+ "execution_count": null,
440
+ "metadata": {
441
+ "id": "RsLvPu1z0geY"
442
+ },
443
+ "outputs": [],
444
+ "source": [
445
+ "prompt = \"\"\"\n",
446
+ "# A long document about the Cat\n",
447
+ "\n",
448
+ "The cat (Felis catus), also referred to as the domestic cat or house cat, is a small\n",
449
+ "domesticated carnivorous mammal. It is the only domesticated species of the family Felidae.\n",
450
+ "Advances in archaeology and genetics have shown that the domestication of the cat occurred\n",
451
+ "in the Near East around 7500 BC. It is commonly kept as a pet and farm cat, but also ranges\n",
452
+ "freely as a feral cat avoiding human contact. It is valued by humans for companionship and\n",
453
+ "its ability to kill vermin. Its retractable claws are adapted to killing small prey species\n",
454
+ "such as mice and rats. It has a strong, flexible body, quick reflexes, and sharp teeth,\n",
455
+ "and its night vision and sense of smell are well developed. It is a social species,\n",
456
+ "but a solitary hunter and a crepuscular predator. Cat communication includes\n",
457
+ "vocalizations—including meowing, purring, trilling, hissing, growling, and grunting—as\n",
458
+ "well as body language. It can hear sounds too faint or too high in frequency for human ears,\n",
459
+ "such as those made by small mammals. It secretes and perceives pheromones.\n",
460
+ "\"\"\"\n",
461
+ "\n",
462
+ "messages = [\n",
463
+ " {\"role\": \"user\", \"content\": prompt},\n",
464
+ "]"
465
+ ]
466
+ },
467
+ {
468
+ "cell_type": "code",
469
+ "execution_count": null,
470
+ "metadata": {
471
+ "id": "6jbz8DYd-o7A"
472
+ },
473
+ "outputs": [],
474
+ "source": [
475
+ "# Generate text\n",
476
+ "from transformers import pipeline\n",
477
+ "\n",
478
+ "generator = pipeline(\"text-generation\", model=\"<your-model-id>\")\n",
479
+ "\n",
480
+ "## Or use the model and tokenizer we defined earlier\n",
481
+ "# generator = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n",
482
+ "\n",
483
+ "generate_kwargs = {\n",
484
+ " \"max_new_tokens\": 256,\n",
485
+ " \"do_sample\": True,\n",
486
+ " \"temperature\": 0.5,\n",
487
+ " \"min_p\": 0.1,\n",
488
+ "}\n",
489
+ "\n",
490
+ "generated_text = generator(messages, generate_kwargs=generate_kwargs)\n",
491
+ "\n",
492
+ "print(generated_text)"
493
+ ]
494
+ },
495
+ {
496
+ "cell_type": "markdown",
497
+ "source": [
498
+ "**Why This Section Is Important**\n",
499
+ "\n",
500
+ "Imagine if after studying, you never got to answer any questions—you wouldn’t know if you understood the information! The Generate Text section is like a pop quiz for your model, showing you in real time if it learned properly during training. It gives you immediate feedback: you can see if the responses are clear, on-topic, or if further adjustments are needed."
501
+ ],
502
+ "metadata": {
503
+ "id": "9b9swdoo-59d"
504
+ }
505
+ },
506
+ {
507
+ "cell_type": "markdown",
508
+ "source": [
509
+ "### Greate! ✨\n",
510
+ "### You have fine-tuned your first model ✅"
511
+ ],
512
+ "metadata": {
513
+ "id": "WfNDbbD_D842"
514
+ }
515
+ }
516
+ ],
517
+ "metadata": {
518
+ "accelerator": "GPU",
519
+ "colab": {
520
+ "gpuType": "A100",
521
+ "machine_shape": "hm",
522
+ "provenance": []
523
+ },
524
+ "kernelspec": {
525
+ "display_name": "Python 3",
526
+ "name": "python3"
527
+ },
528
+ "language_info": {
529
+ "name": "python"
530
+ }
531
+ },
532
+ "nbformat": 4,
533
+ "nbformat_minor": 0
534
+ }