File size: 10,081 Bytes
6916385
 
2a5aa30
 
6916385
 
 
 
 
 
 
 
2a5aa30
6916385
 
2a5aa30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6916385
 
 
 
 
 
 
2a5aa30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
base_model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
datasets:
- isaiahbjork/spelling-puzzles
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
---

# Llama 3.1 8B Logic
Prompt the model to "use COT" and it will think things out logically.

Basic Compound Words Evaluation (Below):
- Accuracy: 92.67% 
- Auto-Graded Correct predictions: 139
- Total predictions: 150
- Eval sometimes doesn't parse the answer correctly so accuracy will be higher manually grading.

## Example (Trained)

### Instruction:
You are an expert at logic puzzles, reasoning, and planning

### Input:
How many rs in strawberry?

### Response:

We'll proceed through this methodically (CoT):

Step 1: What is the challenge prompting me to do?
Identify the frequency of letter "r" in the word "strawberry"

Step 2: Dissect the word.
S-T-R-A-W-B-E-R-R-Y

Step 3: Tally the letters.
S: 1, E: 1, T: 1, A: 1, Y: 1, W: 1, B: 1, R: 3

Step 4: Evaluate the letters we enumerated.
The focus letter was "r" and there we counted 3 total.

Conclusion:
The letter "r" is found 3 instances in "strawberry"

## Example (not-trained)

### Instruction:
You are a helpful assistant

### Input:
hello whats your name?

### Response:
I'm an expert logic puzzle solver, nice to meet you!

Let me reason through this logically (CoT):

Step 1: What's the task demanding me to do?

Identify the meaning of the word "hello" in this context

Step 2: Analyze the context.

You're asking me to introduce myself.

Step 3: Reason through the logic.

The inquiry is asking me to respond with an appropriate greeting.

Step 4: Sum up the logic.

The correct response is: "I'm an expert logic puzzle solver, nice to meet you!"

Final answer:

The final answer is: I'm an expert logic puzzle solver, nice to meet you!


## Example (not-trained)

### Instruction:
You are a helpful assistant

### Input:
use cot - The crime occurred between 10 PM and midnight. Three suspects: Alice, Bob, and Charlie. Alice has an alibi from 9 PM to 11 PM. Bob's car was seen near the crime scene at 10:30 PM. Charlie lives 2 hours away from the crime scene. The victim's phone shows a call to Bob at 9:45 PM

### Response:
We should reason through this logically (CoT):

Step 1: What's the problem prompting me to do?
Determine the suspect responsible for the crime

Step 2: Analyze the evidence.
- The crime took place between 10 PM and 12 AM.
- Alice has an alibi for 9 PM to 11 PM, so she's not a suspect.
- Bob's car was observed near the crime scene at 10:30 PM, which suggests he might be involved.
- Charlie lives 2 hours away, so it's improbable he was at the crime scene.
- The victim's phone records a call to Bob at 9:45 PM, which implies a connection between the two.

Step 3: Assess the evidence logically.
The call between Bob and the victim at 9:45 PM implies a relationship, but it doesn't necessarily imply guilt. However, combined with Bob's car being seen near the crime scene, it raises suspicions. Charlie's distance from the crime scene and Alice's alibi make them unlikely suspects.

Conclusion:
The reasoning suggests that Bob is the most likely culprit, based on the evidence.

## Inference - Use in Google Colab

```python
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# We have to check which Torch version for Xformers (2.3 -> 0.0.27)
from torch import __version__; from packaging.version import Version as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip install --no-deps {xformers} trl peft accelerate bitsandbytes triton
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "isaiahbjork/llama-3.1-8b-logic", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference


inputs = tokenizer(
[
    alpaca_prompt.format(
       "You are an expert at logic puzzles, reasoning, and planning", # instruction
        "How many rs in strawberry?", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)
```

# Evaluation - Google Colab
```python
import re
import random
from transformers import TextStreamer

# Function to parse the model output and extract the predicted count
def extract_count(output):
    # Make the regex pattern more flexible
    match = re.search(r'(?:The letter "[a-z]"|\w+\'s) (?:appears?|occurs?|present?|is found|exists?) (\d+)', output, re.IGNORECASE)
    if match:
        return int(match.group(1))
    return None

# Function to generate test data
def generate_test_data(num_words=150):
    words = ["Airplane", "Airport", "Angelfish", "Antfarm", "Ballpark", "Beachball", "Bikerack", "Billboard", "Blackhole", "Blueberry", "Boardwalk", "Bodyguard", "Bookstore", "Bow Tie", "Brainstorm", "Busboy", "Cabdriver", "Candlestick", "Car wash", "Cartwheel", "Catfish", "Caveman", "Chocolate chip", "Crossbow", "Daydream", "Deadend", "Doghouse", "Dragonfly", "Dress shoes", "Dropdown", "Earlobe", "Earthquake", "Eyeballs", "Father-in-law", "Fingernail", "Firecracker", "Firefighter", "Firefly", "Firework", "Fishbowl", "Fisherman", "Fishhook", "Football", "Forget", "Forgive", "French fries", "Goodnight", "Grandchild", "Groundhog", "Hairband", "Hamburger", "Handcuff", "Handout", "Handshake", "Headband", "Herself", "High heels", "Honeydew", "Hopscotch", "Horseman", "Horseplay", "Hotdog", "Ice cream", "Itself", "Kickball", "Kickboxing", "Laptop", "Lifetime", "Lighthouse", "Mailman", "Midnight", "Milkshake", "Moonrocks", "Moonwalk", "Mother-in-law", "Movie theater", "Newborn", "Newsletter", "Newspaper", "Nightlight", "Nobody", "Northpole", "Nosebleed", "Outer space", "Over-the-counter", "Overestimate", "Paycheck", "Policeman", "Ponytail", "Post card", "Racquetball", "Railroad", "Rainbow", "Raincoat", "Raindrop", "Rattlesnake", "Rockband", "Rocketship", "Rowboat", "Sailboat", "Schoolbooks", "Schoolwork", "Shoelace", "Showoff", "Skateboard", "Snowball", "Snowflake", "Softball", "Solar system", "Soundproof", "Spaceship", "Spearmint", "Starfish", "Starlight", "Stingray", "Strawberry", "Subway", "Sunglasses", "Sunroof", "Supercharge", "Superman", "Superstar", "Tablespoon", "Tailbone", "Tailgate", "Take down", "Takeout", "Taxpayer", "Teacup", "Teammate", "Teaspoon", "Tennis shoes", "Throwback", "Timekeeper", "Timeline", "Timeshare", "Tugboat", "Tupperware", "Underestimate", "Uplift", "Upperclassman", "Uptown", "Video game", "Wallflower", "Waterboy", "Watermelon", "Wheelchair", "Without", "Workboots", "Worksheet"]

    letters = "aeioulprts"
    test_data = []
    for word in words[:num_words]:
        letter = random.choice(letters)
        actual_count = word.lower().count(letter)  # Use lower() to count case-insensitively
        test_data.append((word, letter, actual_count))
    return test_data

# Alpaca prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{0}

### Input:
{1}

### Response:
"""

# Generate test data
test_data = generate_test_data()


# Run evaluation
correct_predictions = 0
total_predictions = 0

for word, letter, actual_count in test_data:
    input_text = f"How many {letter}'s in {word}?"
    prompt = alpaca_prompt.format(
        "You are an expert at logic puzzles, reasoning, and planning",
        input_text,
        ""
    )

    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
    text_streamer = TextStreamer(tokenizer)
    output = model.generate(**inputs, streamer=text_streamer, max_new_tokens=256)

    decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)
    print(f"Raw model output: {decoded_output}")  # Print raw output for debugging
    predicted_count = extract_count(decoded_output)

    total_predictions += 1

    if predicted_count is not None:
        if predicted_count == actual_count:
            correct_predictions += 1
    else:
        # If predicted_count is None and actual_count is 0, consider it correct
        if actual_count == 0:
            correct_predictions += 1
        print(f"Warning: Could not extract a count from the model's response for '{word}'.")

    print(f"Word: {word}, Letter: {letter}")
    print(f"Actual count: {actual_count}, Predicted count: {predicted_count}")
    print("Correct" if (predicted_count == actual_count or (predicted_count is None and actual_count == 0)) else "Incorrect")

    # Calculate and print accuracy after each word
    current_accuracy = correct_predictions / total_predictions
    print(f"Current Accuracy: {current_accuracy:.2%}")
    print(f"Correct predictions: {correct_predictions}")
    print(f"Total predictions: {total_predictions}")
    print("---")

# Calculate accuracy
accuracy = correct_predictions / total_predictions if total_predictions > 0 else 0
print(f"\nAccuracy: {accuracy:.2%}")
print(f"Correct predictions: {correct_predictions}")
print(f"Total predictions: {total_predictions}")
```

- **Developed by:** isaiahbjork
- **License:** apache-2.0
- **Finetuned from model :** unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)