license: apache-2.0
datasets:
- allenai/c4
- databricks/databricks-dolly-15k
language:
- en
pipeline_tag: text-generation
tags:
- qwen2
- transformers
- text-generation
Bootstrap LLM
Introduction
Ever since I released my first Qwen2 based model several weeks ago I've taken what I've learned and attempted to create a new model that has been pre-trained more thoroughly and on a more diverse dataset. I settled on using the unfiltered version of the english subset of c4 with entries being shuffled in batches of 1000 in an effort to deviate away from continuous streams of related training data. As for fine-tuning I initially opted to use agentlans/multiturn-chat because of the large amounts of examples they had over databricks/databricks-dolly-15k however I reverted back to dolly-15k due to the verbosity of the conversations in multiturn chat which wasn't the best suited for a short 1024-token context model.
Model Details
- Model Name: Bootstrap LLM
- Architecture: Qwen2-based
- Context: 1024 Tokens
- Vocab Size: 50,262 tokens
- Qwen2 Specific: Hidden size of 768, 6 layers, 6 heads
Training Details
- GPU: NVIDIA GeForce RTX 4070 Laptop GPU
- Cuda: CUDA was used during pre-training and fine-tuning.
- VRAM: 8 GB
Like my previous model the AllenAI C4 English dataset was used for pre-training with the key difference being that I used the "en.noblocklist" subset for more diversity. Instead of creating my own tokenizer I opted instead to using the internal tokenizer of GPT-2 because it saved me a lot of extra computation and was proven in real world examples to be effective. The model was trained on 280 thousand steps with 1024 token context, at a per device training batch size of 4, and 4 gradient accumulation steps. Pre-training took about 60 hours with the GPU overclocked to its maximum capacity. Post-training involved 5 epochs of databricks/databricks-dolly-15k formatted in ChatML.
How to use
Below, I’ve included a simple python script you can use. The model should be usable directly through the transformers library but you can change the model path to point to a directory containing the model too.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "TheOneWhoWill/Bootstrap-LLM"
tokenizer = AutoTokenizer.from_pretrained(model_path)
stop_token_id = tokenizer.eos_token_id
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype="auto",
device_map="auto"
)
from transformers import pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer
)
messages = []
temperature = float(input("Enter temperature (e.g., 0.9): ") or 1)
token_limit = 256
while True:
user_input = input("User: ")
if user_input.lower() in ["exit", "quit"]:
print("Exiting the chat.")
break
if user_input.lower().startswith("temperature:"):
temperature = float(user_input.lower().split("temperature:")[1] or temperature)
print(f"Temperature set to {temperature}")
continue
if user_input.lower().startswith("reset"):
messages = []
print("Conversation reset.")
continue
if user_input.lower().startswith("tokens:"):
token_limit = int(user_input.lower().split("tokens:")[1] or 1024)
print(f"Token limit set to {token_limit}")
continue
if user_input.lower().startswith("debug"):
tokens_in_last_response = tokenizer.tokenize(messages[-1]["content"])
print("Number of Tokens:", len(tokens_in_last_response))
for token in tokens_in_last_response:
if token == "<|im_end|>":
print("End of message token found.")
continue
messages.append({"role": "user", "content": user_input})
# Generate and print
response = pipe(
messages,
max_new_tokens=token_limit,
do_sample=True,
temperature=temperature,
top_k=64,
top_p=0.95,
eos_token_id=stop_token_id
)
response = response[0]['generated_text'][-1]["content"]
messages.append({"role": "assistant", "content": response})
print("Assistant:", response)