{ "cells": [ { "cell_type": "markdown", "id": "8a904d03-8c1a-4eb4-bd77-a26ec985d992", "metadata": {}, "source": [ "## Load the Dataset" ] }, { "cell_type": "code", "execution_count": 1, "id": "f398bea9-d201-43ae-95e8-d4fad1de651c", "metadata": {}, "outputs": [], "source": [ "# Opening modified dataset uploaded to HF\n", "from datasets import load_dataset\n", "\n", "dataset_name = \"RaviNaik/oasst1-chatml\"\n", "dataset = load_dataset(dataset_name, split=\"train\")" ] }, { "cell_type": "markdown", "id": "b74449fb-9f01-4c96-8a7e-6661180be1f3", "metadata": {}, "source": [ "## Load the Model, Tokenizer and configure bnb" ] }, { "cell_type": "code", "execution_count": 2, "id": "ab5a26da-b9e0-424f-b482-034c10f049ce", "metadata": {}, "outputs": [], "source": [ "import torch\n", "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer" ] }, { "cell_type": "code", "execution_count": 3, "id": "8f5d677d-082a-4d96-9fab-4d66981ecc27", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "88aa19589a2b46f9888c214db3867598", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading checkpoint shards: 0%| | 0/2 [00:00\n", " \n", " \n", " [500/500 1:11:36, Epoch 1/2]\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StepTraining Loss
101.720800
201.548300
301.550700
401.543500
501.506600
601.501400
701.559600
801.552900
901.506300
1001.471700
1101.484100
1201.498100
1301.517600
1401.481100
1501.485100
1601.516800
1701.505000
1801.484600
1901.495200
2001.447100
2101.520700
2201.444500
2301.464800
2401.480200
2501.444100
2601.543900
2701.512700
2801.441300
2901.502200
3001.476900
3101.478200
3201.481000
3301.433600
3401.404000
3501.401000
3601.424400
3701.429100
3801.388700
3901.402600
4001.417900
4101.358200
4201.460700
4301.417800
4401.447300
4501.429200
4601.388100
4701.433200
4801.431600
4901.491200
5001.406600

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "/home/ravi.naik/miniconda3/envs/torchenv/lib/python3.10/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n", " warnings.warn('Was asked to gather along dimension 0, but all '\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a595199a17b14a5b99d4c06f5eda00af", "version_major": 2, "version_minor": 0 }, "text/plain": [ "config.json: 0%| | 0.00/866 [00:00system\n", "You are a helpful assistant who always respond to user queries<|im_end|>\n", "user\n", "{prompt}<|im_end|>\n", "<|im_start|>assistant\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 17, "id": "e3e26a34-acfc-45f7-8a4a-96157e25d1f5", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ravi.naik/miniconda3/envs/torchenv/lib/python3.10/site-packages/transformers/generation/utils.py:1518: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration )\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "<|im_start|>system\n", "You are a helpful assistant who always respond to user queries<|im_end|>\n", "user\n", "What is a large language model?<|im_end|>\n", "<|im_start|>assistant\n", "A large language model (LLM) is a type of artificial intelligence model that is designed to understand and generate human language. LLMs are typically trained on large amounts of text data and are capable of performing a wide range of language-related tasks, such as language translation, text summarization, and text generation.\n", "\n", "LLMs are different from traditional language models, such as recurrent neural networks (RNNs) or transformers, in that they are designed to handle large amounts of text data and are able to learn from this data in a more efficient and effective way. This makes them well-suited for tasks that require a deep understanding of language, such as natural language processing (N\n" ] } ], "source": [ "prompt = \"What is a large language model?\"\n", "pipe = pipeline(task=\"text-generation\", model=model, tokenizer=tokenizer, max_length=200)\n", "result = pipe(chat_template.format(prompt=prompt))\n", "print(result[0]['generated_text'])" ] }, { "cell_type": "code", "execution_count": 18, "id": "7af8f545-e76c-4ce3-a02d-864404dd357c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<|im_start|>system\n", "You are a helpful assistant who always respond to user queries<|im_end|>\n", "user\n", "Write a Python program to print first 50 prime numbers<|im_end|>\n", "<|im_start|>assistant\n", "Here's a Python program that prints the first 50 prime numbers:\n", "\n", "```python\n", "def is_prime(n):\n", " if n < 2:\n", " return False\n", " for i in range(2, int(n**0.5) + 1):\n", " if n % i == 0:\n", " return False\n", " return True\n", "\n", "count = 0\n", "num = 2\n", "while count < 50:\n", " if is_prime(num):\n", " print(num)\n", " count += 1\n", " num += 1\n", "```\n", "\n", "This program uses a helper function `is_prime` to check if a number is prime\n" ] } ], "source": [ "prompt = \"Write a Python program to print first 50 prime numbers\"\n", "pipe = pipeline(task=\"text-generation\", model=model, tokenizer=tokenizer, max_length=200)\n", "result = pipe(chat_template.format(prompt=prompt))\n", "print(result[0]['generated_text'])" ] }, { "cell_type": "code", "execution_count": 19, "id": "398c483f-d4d6-4532-aecf-e22c2fc001f0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<|im_start|>system\n", "You are a helpful assistant who always respond to user queries<|im_end|>\n", "user\n", "Can you write a short introduction about the relevance of the term monopsony in economics?<|im_end|>\n", "<|im_start|>assistant\n", "Sure, I'd be happy to help!\n", "\n", "Monopsony is a term used in economics to describe a market structure in which there is only one buyer for a particular good or service. This means that the buyer has significant bargaining power and can set the price at which they will purchase the good or service.\n", "\n", "In a monopsony, the buyer has the ability to influence the price of the good or service, which can have significant implications for the market. For example, if the buyer is a large corporation, they may be able to drive down the price of the good or service, which can have negative effects on the\n" ] } ], "source": [ "prompt = \"Can you write a short introduction about the relevance of the term monopsony in economics?\"\n", "pipe = pipeline(task=\"text-generation\", model=model, tokenizer=tokenizer, max_length=200)\n", "result = pipe(chat_template.format(prompt=prompt))\n", "print(result[0]['generated_text'])" ] }, { "cell_type": "code", "execution_count": 20, "id": "23d75586-7177-4c82-9f93-170ee982e319", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "You are using a model of type phi to instantiate a model of type phi-msft. This is not supported for all configurations of models and can yield errors.\n" ] } ], "source": [ "trainer.save_model(\"checkpoints\")" ] }, { "cell_type": "code", "execution_count": 21, "id": "7a362585-9064-4961-9160-0e560be70731", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "You are using a model of type phi to instantiate a model of type phi-msft. This is not supported for all configurations of models and can yield errors.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "417e9c7948d7488aa6fc1b8c613d1d75", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Upload 3 LFS files: 0%| | 0/3 [00:00