--- library_name: transformers license: apache-2.0 base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit tags: - generated_from_trainer datasets: - instruction_solution_to_thought_dataset.jsonl - secemp9/instruction_solution_thought model-index: - name: outputs_solution_to_thought results: [] --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65986192b0c5357368bacbf8/-_THTLhEqxfXjuyh_jaFk.png) # TraceBack 12b Release TraceBack is what I came up with when I thought, "how can we scale reasoning trace data generation effectively?" Turn out you do not need to depend on just reasoning models (r1, o1, o3, etc) to create reasoning trace! It has many goals in mind, but mainly: - enabling faster synthetic reasoning dataset generation, since we're using a small model here (smaller than r1, etc) so faster to do inference on, thus easier to scale - distill on synthetic traces for out of domain non-verifiable problems - converting any non-reasoning model output/datasets to a reasoning synthetic dataset when used as input So far, current proof of concept managed to check the boxes for 1 and 3, and I plan on scaling this more as: - this only use Mistral Nemo 12b as base - Was only trained for 2 epochs - Only 200k samples were used for finetuning (Qlora), dataset at https://huggingface.co/datasets/secemp9/instruction_solution_thought So there are still much room for improvement This was trained using both instruction and solution as input, and the output being a plausible/possible/matching reasoning trace based on that. I believe this is the future of reasoning data generation. Stay tuned for an eval release Here some inference example, using chatgpt instruction + solution as input: # Inference Example Here I use a simple example from chatgpt, passing both the instruction and the solution as input to the model: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65986192b0c5357368bacbf8/rtuYmWGw8lk09AQi_dpX8.png) # Dataset Example Here the format for the dataset follow instruction + solution: reasoning trace pairs Sample conversation: ``` { "messages": [ { "role": "user", "content": "Instruction: text_here Solution: text_here }, { "role": "assistant", "content": "text_here" } ] } ``` which look like: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65986192b0c5357368bacbf8/GdbZxeLSDsJmZDHJ8SN-g.png) # Prompt Format For the prompt format, I was really trying to not overengineer, but I'm sure there is a better way to format this. For now it's just: Instruction: Solution: the output of the model doesn't have (for now) any formatting, it's just reasoning as output # Code Example - Using transformers: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load the tokenizer and model model_name = "secemp9/TraceBack-12b" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Move the model to the desired device device = 'cuda' if torch.cuda.is_available() else 'cpu' model.to(device) # Define the messages messages = [ {"role": "user", "content": """Instruction: how many r in strawberry Solution: There are **three** "r"s in "strawberry." """} ] # Step 1: Apply chat template to get formatted text as a string formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # Step 2: Tokenize the formatted text into a dictionary of tensors inputs = tokenizer(formatted_text, return_tensors="pt").to(device) # Generate the response outputs = model.generate(**inputs, max_new_tokens=32000) # Decode and print the output generated_text = tokenizer.decode(outputs[0]) print(generated_text) ``` - unsloth ```python from unsloth import FastLanguageModel # Load the model and tokenizer model, tokenizer = FastLanguageModel.from_pretrained("secemp9/TraceBack-12b") # Define the messages (replace "stuff_here" with your actual input) messages = [ {"role": "user", "content": """Instruction: how many r in strawberry Solution: There are **three** "r"s in "strawberry." """} ] # Step 1: Apply chat template to get formatted text as a string formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # Step 2: Tokenize the formatted text into a dictionary of tensors inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device) # Generate the response outputs = model.generate(**inputs, max_new_tokens=32000) # Decode and print the output generated_text = tokenizer.decode(outputs[0]) print(generated_text) ``` # Axolotl config For this, I basically tried to convert my unsloth code to an axolotl config file. I also used deepspeed. Configuration below: config.yml ``` # Base model configuration base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit load_in_4bit: true # Dataset configuration datasets: - path: instruction_solution_to_thought_dataset.jsonl type: chat_template # Chat template chat_template: chatml # LoRA adapter configuration adapter: lora lora_r: 16 lora_alpha: 16 lora_dropout: 0 lora_target_modules: - q_proj - k_proj - v_proj - o_proj - gate_proj - up_proj - down_proj # Training hyperparameters max_seq_length: 128000 micro_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 3e-5 num_epochs: 3 warmup_steps: 100 optimizer: adamw_8bit weight_decay: 0.01 lr_scheduler_type: cosine max_grad_norm: 1.0 output_dir: ./outputs_solution_to_thought seed: 3407 merge_lora: true hf_upload: true hf_repo: secemp9/TraceBack-12b xformers_attention: flash_attention: True bf16: true # Enable BF16 mixed precision # Multi-GPU training with DeepSpeed deepspeed: deepspeed_configs/zero2.json # Optional: Enable gradient checkpointing gradient_checkpointing: true ``` deepspeed_configs/zero2.json ``` { "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 2e8, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2e8, "contiguous_gradients": true }, "bf16": { "enabled": true }, "optimizer": { "type": "AdamW", "params": { "lr": "auto", "weight_decay": "auto", "betas": [0.9, 0.999], "eps": 1e-8 } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": 0, "warmup_max_lr": "auto", "warmup_num_steps": "auto" } }, "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "steps_per_print": 10, "wandb": { "enabled": true } } ```