--- license: apache-2.0 base_model: - meta-llama/Llama-3.2-3B tags: - llama-3.2 - thought-chain - instruction-finetuning - transformers library_name: transformers pipeline_tag: text-generation --- # Thought-Ranked Llama 3.2 3B ## Model Description This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a thought-chain ranking approach. (Weekend project, just a few hundred steps of training) ### Training Process 1. **Initial Generation**: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: `{char}` for each character in `[a-zA-Z0-9]`. Each thought chain is allowed up to 128 tokens. 2. **Answer Generation**: Following each thought chain, the model generates a complete answer with up to 2048 tokens. 3. **Ranking & Selection**: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns. 4. **Final Training**: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously. ### Key Features - **Thought Chain Generation**: The model has learned to generate explicit thought processes before providing answers - **Greedy Sampling**: Uses greedy sampling for both thought generation and final answers - **Length Parameters**: - Thought chains: Up to 128 tokens - Final answers: Up to 2048 tokens ### Model Architecture - Base model: Llama 3.2 3B (Base) - Architecture: Transformer-based language model - Parameters: ~3.2 billion - Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking ## Intended Use This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to: - Problem-solving - Mathematical reasoning - Logical deduction - Step-by-step explanations - Complex decision making ### Out-of-Scope Uses - Direct deployment without safety measures - Applications requiring guaranteed accuracy - Critical decision-making without human oversight - Tasks requiring capabilities beyond the base Llama 3.2 3B model ## Training Details ### Training Data The model was trained using: - Sample questions paired with multiple thought variations - Thought chains generated using systematic character prefixes - Rankings derived from LLM evaluation of answer quality ### Training Procedure 1. **Thought Generation Phase** - Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9) - Sampled with temperature=0.0 - Maximum thought length: 128 tokens 2. **Answer Generation Phase** - Generated completions following each thought chain - Maximum answer length: 2048 tokens - Sampled with temperature=0.0 3. **Ranking Phase** - External LLM evaluated answer quality - Ranking performed without access to thought chains - Selected highest-performing thought-answer pairs 4. **Final Training Phase** - Fine-tuned on best-performing thought-answer combinations - 4 complete rounds of training ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT") tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT") # Example usage prompt = "Solve this math problem: 2x + 3 = 7" input_ids = tokenizer.apply_chat_template( [{"role": "user", "content": prompt}], return_tensors="pt" ) # Generate response with thought chain output = model.generate( input_ids, temperature=1.0, ) response = tokenizer.decode(output[0]) ``` ## Limitations - Limited to the capabilities of the base Llama 3.2 3B model - May generate thought chains that are not always optimal - Performance depends on the quality of the LLM ranking system used during training - Training process may not capture all possible effective thought patterns - Limited by the context window of the base model ## Ethical Considerations - The model inherits biases from the base Llama 3.2 3B model - Generated thought chains should be reviewed for accuracy and appropriateness - The model's reasoning process should not be relied upon for critical decisions without human verification - Users should implement appropriate content filtering and safety measures ## Citation If you use this model in your research, please cite: ```bibtex @misc{thought-ranked-llama, title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains}, author={[Eric Florenzano]}, year={2024}, howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}} } ```