--- base_model: unsloth/llama-3.2-3b-instruct-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl - sft license: apache-2.0 language: - en datasets: - BAAI/Infinity-Instruct --- # Fine-tune Llama 3.2 3B Using Unsloth and BAAI/Infinity-Instruct Dataset This model uses the "0625" version, but there will be a fine-tuned model trained with the "7M" version as well. ## Uploaded Model - **Developed by:** MateoRov - **License:** apache-2.0 - **Fine-tuned from model:** unsloth/llama-3.2-3b-instruct-bnb-4bit ## Usage Check my full repo on github for better undestanding: https://github.com/Mateorovere/FineTuning-LLM-Llama3.2-3b But with the proper dependencies you can run the model with the following code: ```python from unsloth.chat_templates import get_chat_template from unsloth import FastLanguageModel # Get the chat template tokenizer = get_chat_template( tokenizer, chat_template="llama-3.1", ) model = "MateoRov/Llama3.2-3b-SFF-Infinity-MateoRovere" # Enable native 2x faster inference FastLanguageModel.for_inference(model) # Define the input message messages = [ {"role": "user", "content": "Continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8,"}, ] # Prepare the inputs inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, # Must add for generation return_tensors="pt", ).to("cuda") # Generate the output outputs = model.generate( input_ids=inputs, max_new_tokens=64, use_cache=True, temperature=1.5, min_p=0.1, ) # Decode the outputs result = tokenizer.batch_decode(outputs) print(result) ``` To get the generation token by token: ```python from unsloth.chat_templates import get_chat_template from unsloth import FastLanguageModel from transformers import TextStreamer model = "MateoRov/Llama3.2-3b-SFF-Infinity-MateoRovere" # Enable native 2x faster inference FastLanguageModel.for_inference(model) # Get the chat template tokenizer = get_chat_template( tokenizer, chat_template="llama-3.1", ) # Define the input message messages = [ {"role": "user", "content": "Continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8,"}, ] # Prepare the inputs inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, # Must add for generation return_tensors="pt", ).to("cuda") # Initialize the text streamer text_streamer = TextStreamer(tokenizer, skip_prompt=True) # Generate the output token by token _ = model.generate( input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1, ) ```