Bootstrap-LLM / README.md

TheOneWhoWill

Version 2 of a locally trained model

2101d58 verified 17 days ago

preview code

raw

history blame contribute delete

4.29 kB

metadata

license: apache-2.0
datasets:
  - allenai/c4
  - databricks/databricks-dolly-15k
language:
  - en
pipeline_tag: text-generation
tags:
  - qwen2
  - transformers
  - text-generation

Bootstrap LLM

Introduction

Ever since I released my first Qwen2 based model several weeks ago I've taken what I've learned and attempted to create a new model that has been pre-trained more thoroughly and on a more diverse dataset. I settled on using the unfiltered version of the english subset of c4 with entries being shuffled in batches of 1000 in an effort to deviate away from continuous streams of related training data. As for fine-tuning I initially opted to use agentlans/multiturn-chat because of the large amounts of examples they had over databricks/databricks-dolly-15k however I reverted back to dolly-15k due to the verbosity of the conversations in multiturn chat which wasn't the best suited for a short 1024-token context model.

Model Details

Model Name: Bootstrap LLM
Architecture: Qwen2-based
Context: 1024 Tokens
Vocab Size: 50,262 tokens
Qwen2 Specific: Hidden size of 768, 6 layers, 6 heads

Training Details

GPU: NVIDIA GeForce RTX 4070 Laptop GPU
Cuda: CUDA was used during pre-training and fine-tuning.
VRAM: 8 GB

Like my previous model the AllenAI C4 English dataset was used for pre-training with the key difference being that I used the "en.noblocklist" subset for more diversity. Instead of creating my own tokenizer I opted instead to using the internal tokenizer of GPT-2 because it saved me a lot of extra computation and was proven in real world examples to be effective. The model was trained on 280 thousand steps with 1024 token context, at a per device training batch size of 4, and 4 gradient accumulation steps. Pre-training took about 60 hours with the GPU overclocked to its maximum capacity. Post-training involved 5 epochs of databricks/databricks-dolly-15k formatted in ChatML.

How to use

Below, I’ve included a simple python script you can use. The model should be usable directly through the transformers library but you can change the model path to point to a directory containing the model too.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "TheOneWhoWill/Bootstrap-LLM"
tokenizer = AutoTokenizer.from_pretrained(model_path)
stop_token_id = tokenizer.eos_token_id
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto"
)

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

messages = []

temperature = float(input("Enter temperature (e.g., 0.9): ") or 1)
token_limit = 256

while True:
    user_input = input("User: ")
    if user_input.lower() in ["exit", "quit"]:
        print("Exiting the chat.")
        break
    if user_input.lower().startswith("temperature:"):
        temperature = float(user_input.lower().split("temperature:")[1] or temperature)
        print(f"Temperature set to {temperature}")
        continue
    if user_input.lower().startswith("reset"):
        messages = []
        print("Conversation reset.")
        continue
    if user_input.lower().startswith("tokens:"):
        token_limit = int(user_input.lower().split("tokens:")[1] or 1024)
        print(f"Token limit set to {token_limit}")
        continue
    if user_input.lower().startswith("debug"):
        tokens_in_last_response = tokenizer.tokenize(messages[-1]["content"])
        print("Number of Tokens:", len(tokens_in_last_response))
        for token in tokens_in_last_response:
            if token == "<|im_end|>":
                print("End of message token found.")
        continue
    messages.append({"role": "user", "content": user_input})
    # Generate and print
    response = pipe(
        messages,
        max_new_tokens=token_limit,
        do_sample=True,
        temperature=temperature,
        top_k=64,
        top_p=0.95,
        eos_token_id=stop_token_id
    )
    response = response[0]['generated_text'][-1]["content"]
    messages.append({"role": "assistant", "content": response})
    print("Assistant:", response)