--- base_model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl datasets: - isaiahbjork/spelling-puzzles --- # Llama 3.1 8B Logic Prompt the model to "use COT" and will think things out logically. Basic Compound Words Evaluation (Below): - Accuracy: 86.00% - Correct predictions: 129 - Total predictions: 150 ## Example (Trained) ### Instruction: You are an expert at logic puzzles, reasoning, and planning ### Input: How many rs in strawberry? ### Response: We'll proceed through this methodically (CoT): Step 1: What is the challenge prompting me to do? Identify the frequency of letter "r" in the word "strawberry" Step 2: Dissect the word. S-T-R-A-W-B-E-R-R-Y Step 3: Tally the letters. S: 1, E: 1, T: 1, A: 1, Y: 1, W: 1, B: 1, R: 3 Step 4: Evaluate the letters we enumerated. The focus letter was "r" and there we counted 3 total. Conclusion: The letter "r" is found 3 instances in "strawberry" ## Example (not-trained) ### Instruction: You are a helpful assistant ### Input: hello whats your name? ### Response: I'm an expert logic puzzle solver, nice to meet you! Let me reason through this logically (CoT): Step 1: What's the task demanding me to do? Identify the meaning of the word "hello" in this context Step 2: Analyze the context. You're asking me to introduce myself. Step 3: Reason through the logic. The inquiry is asking me to respond with an appropriate greeting. Step 4: Sum up the logic. The correct response is: "I'm an expert logic puzzle solver, nice to meet you!" Final answer: The final answer is: I'm an expert logic puzzle solver, nice to meet you! ## Example (not-trained) ### Instruction: You are a helpful assistant ### Input: use cot - The crime occurred between 10 PM and midnight. Three suspects: Alice, Bob, and Charlie. Alice has an alibi from 9 PM to 11 PM. Bob's car was seen near the crime scene at 10:30 PM. Charlie lives 2 hours away from the crime scene. The victim's phone shows a call to Bob at 9:45 PM ### Response: We should reason through this logically (CoT): Step 1: What's the problem prompting me to do? Determine the suspect responsible for the crime Step 2: Analyze the evidence. - The crime took place between 10 PM and 12 AM. - Alice has an alibi for 9 PM to 11 PM, so she's not a suspect. - Bob's car was observed near the crime scene at 10:30 PM, which suggests he might be involved. - Charlie lives 2 hours away, so it's improbable he was at the crime scene. - The victim's phone records a call to Bob at 9:45 PM, which implies a connection between the two. Step 3: Assess the evidence logically. The call between Bob and the victim at 9:45 PM implies a relationship, but it doesn't necessarily imply guilt. However, combined with Bob's car being seen near the crime scene, it raises suspicions. Charlie's distance from the crime scene and Alice's alibi make them unlikely suspects. Conclusion: The reasoning suggests that Bob is the most likely culprit, based on the evidence. ## Inference - Use in Google Colab ```python %%capture # Installs Unsloth, Xformers (Flash Attention) and all other packages! !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" # We have to check which Torch version for Xformers (2.3 -> 0.0.27) from torch import __version__; from packaging.version import Version as V xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers" !pip install --no-deps {xformers} trl peft accelerate bitsandbytes triton alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "isaiahbjork/llama-3.1-8b-logic", # YOUR MODEL YOU USED FOR TRAINING max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) FastLanguageModel.for_inference(model) # Enable native 2x faster inference inputs = tokenizer( [ alpaca_prompt.format( "You are an expert at logic puzzles, reasoning, and planning", # instruction "How many rs in strawberry?", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") from transformers import TextStreamer text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256) ``` # Evaluation - Google Colab ```python import re import random from transformers import TextStreamer # Function to parse the model output and extract the predicted count def extract_count(output): match = re.search(r'The letter "[a-z]" (?:appears|occurs|is found|exists) (\d+)', output) if match: return int(match.group(1)) return None # Function to generate test data def generate_test_data(num_words=150): words = ["Airplane", "Airport", "Angelfish", "Antfarm", "Ballpark", "Beachball", "Bikerack", "Billboard", "Blackhole", "Blueberry", "Boardwalk", "Bodyguard", "Bookstore", "Bow Tie", "Brainstorm", "Busboy", "Cabdriver", "Candlestick", "Car wash", "Cartwheel", "Catfish", "Caveman", "Chocolate chip", "Crossbow", "Daydream", "Deadend", "Doghouse", "Dragonfly", "Dress shoes", "Dropdown", "Earlobe", "Earthquake", "Eyeballs", "Father-in-law", "Fingernail", "Firecracker", "Firefighter", "Firefly", "Firework", "Fishbowl", "Fisherman", "Fishhook", "Football", "Forget", "Forgive", "French fries", "Goodnight", "Grandchild", "Groundhog", "Hairband", "Hamburger", "Handcuff", "Handout", "Handshake", "Headband", "Herself", "High heels", "Honeydew", "Hopscotch", "Horseman", "Horseplay", "Hotdog", "Ice cream", "Itself", "Kickball", "Kickboxing", "Laptop", "Lifetime", "Lighthouse", "Mailman", "Midnight", "Milkshake", "Moonrocks", "Moonwalk", "Mother-in-law", "Movie theater", "Newborn", "Newsletter", "Newspaper", "Nightlight", "Nobody", "Northpole", "Nosebleed", "Outer space", "Over-the-counter", "Overestimate", "Paycheck", "Policeman", "Ponytail", "Post card", "Racquetball", "Railroad", "Rainbow", "Raincoat", "Raindrop", "Rattlesnake", "Rockband", "Rocketship", "Rowboat", "Sailboat", "Schoolbooks", "Schoolwork", "Shoelace", "Showoff", "Skateboard", "Snowball", "Snowflake", "Softball", "Solar system", "Soundproof", "Spaceship", "Spearmint", "Starfish", "Starlight", "Stingray", "Strawberry", "Subway", "Sunglasses", "Sunroof", "Supercharge", "Superman", "Superstar", "Tablespoon", "Tailbone", "Tailgate", "Take down", "Takeout", "Taxpayer", "Teacup", "Teammate", "Teaspoon", "Tennis shoes", "Throwback", "Timekeeper", "Timeline", "Timeshare", "Tugboat", "Tupperware", "Underestimate", "Uplift", "Upperclassman", "Uptown", "Video game", "Wallflower", "Waterboy", "Watermelon", "Wheelchair", "Without", "Workboots", "Worksheet"] # "Airplane", "Airport", "Angelfish", "Antfarm", "Ballpark", "Beachball", "Bikerack", "Billboard", "Blackhole", "Blueberry", "Boardwalk", "Bodyguard", "Bookstore", "Bow Tie", "Brainstorm", "Busboy", "Cabdriver", "Candlestick", "Car wash", "Cartwheel", "Catfish", "Caveman", "Chocolate chip", "Crossbow", "Daydream", "Deadend", "Doghouse", "Dragonfly", "Dress shoes", "Dropdown", "Earlobe", "Earthquake", "Eyeballs", "Father-in-law", "Fingernail", "Firecracker", "Firefighter", "Firefly", "Firework", "Fishbowl", "Fisherman", "Fishhook", "Football", "Forget", "Forgive", "French fries", "Goodnight", "Grandchild", "Groundhog", "Hairband", "Hamburger", "Handcuff", "Handout", "Handshake", "Headband", "Herself", "High heels", "Honeydew", "Hopscotch", "Horseman", "Horseplay", "Hotdog", "Ice cream", "Itself", "Kickball", "Kickboxing", "Laptop", "Lifetime", "Lighthouse", "Mailman", "Midnight", "Milkshake", "Moonrocks", "Moonwalk", "Mother-in-law", "Movie theater", "Newborn", "Newsletter", "Newspaper", "Nightlight", "Nobody", "Northpole", "Nosebleed", "Outer space", "Over-the-counter", "Overestimate", "Paycheck", "Policeman", "Ponytail", "Post card", "Racquetball", "Railroad", "Rainbow", "Raincoat", "Raindrop", "Rattlesnake", "Rockband", "Rocketship", "Rowboat", "Sailboat", "Schoolbooks", "Schoolwork", "Shoelace", "Showoff", "Skateboard", "Snowball", "Snowflake", "Softball", "Solar system", "Soundproof", "Spaceship", "Spearmint", "Starfish", "Starlight", "Stingray", "Strawberry", "Subway", "Sunglasses", "Sunroof", "Supercharge", "Superman", "Superstar", "Tablespoon", "Tailbone", "Tailgate", "Take down", "Takeout", "Taxpayer", "Teacup", "Teammate", "Teaspoon", "Tennis shoes", "Throwback", "Timekeeper", "Timeline", "Timeshare", "Tugboat", "Tupperware", "Underestimate", "Uplift", "Upperclassman", "Uptown", "Video game", "Wallflower", "Waterboy", "Watermelon", "Wheelchair", "Without", "Workboots", "Worksheet" letters = "aeioulprts" test_data = [] for word in words[:num_words]: letter = random.choice(letters) actual_count = word.lower().count(letter) # Use lower() to count case-insensitively test_data.append((word, letter, actual_count)) return test_data # Alpaca prompt template alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {0} ### Input: {1} ### Response: """ # Generate test data test_data = generate_test_data() # Run evaluation correct_predictions = 0 total_predictions = 0 for word, letter, actual_count in test_data: input_text = f"How many {letter}'s in {word}?" prompt = alpaca_prompt.format( "You are an expert at logic puzzles, reasoning, and planning", input_text, "" ) inputs = tokenizer([prompt], return_tensors="pt").to("cuda") text_streamer = TextStreamer(tokenizer) output = model.generate(**inputs, streamer=text_streamer, max_new_tokens=256) decoded_output = tokenizer.decode(output[0], skip_special_tokens=True) predicted_count = extract_count(decoded_output) total_predictions += 1 if predicted_count is not None: if predicted_count == actual_count: correct_predictions += 1 else: # If predicted_count is None and actual_count is 0, consider it correct if actual_count == 0: correct_predictions += 1 print(f"Warning: Could not extract a count from the model's response for '{word}'.") print(f"Word: {word}, Letter: {letter}") print(f"Actual count: {actual_count}, Predicted count: {predicted_count}") print("Correct" if (predicted_count == actual_count or (predicted_count is None and actual_count == 0)) else "Incorrect") # Calculate and print accuracy after each word current_accuracy = correct_predictions / total_predictions print(f"Current Accuracy: {current_accuracy:.2%}") print(f"Correct predictions: {correct_predictions}") print(f"Total predictions: {total_predictions}") print("---") # Calculate accuracy accuracy = correct_predictions / total_predictions if total_predictions > 0 else 0 print(f"\nAccuracy: {accuracy:.2%}") print(f"Correct predictions: {correct_predictions}") print(f"Total predictions: {total_predictions}") ``` - **Developed by:** isaiahbjork - **License:** apache-2.0 - **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)