Triangle104
/

Llama-3.1-Tulu-3-8B-Q8_0-GGUF

@@ -16,6 +16,197 @@ tags:
 This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
+---
+Model details:
+-
+Tülu3 is a leading instruction following model family, offering fully
+ open-source data, code, and recipes designed to serve as a
+comprehensive guide for modern post-training techniques.
+Tülu3 is designed for state-of-the-art performance on a diversity of
+tasks in addition to chat, such as MATH, GSM8K, and IFEval.
+    Model description
+Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
+Language(s) (NLP): Primarily English
+License: Llama 3.1 Community License Agreement
+Finetuned from model: allenai/Llama-3.1-Tulu-3-8B-DPO
+    Model Sources
+Training Repository: https://github.com/allenai/open-instruct
+Eval Repository: https://github.com/allenai/olmes
+Paper: https://arxiv.org/abs/2411.15124
+Demo: https://playground.allenai.org/
+Using the model
+    Loading with HuggingFace
+To load the model with HuggingFace, use the following snippet:
+from transformers import AutoModelForCausalLM
+tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B")
+    VLLM
+As a Llama base model, the model can be easily served with:
+vllm serve allenai/Llama-3.1-Tulu-3-8B
+Note that given the long chat template of Llama, you may want to use --max_model_len=8192.
+    Chat template
+The chat template for our models is formatted as:
+<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a
+computer program, so I don't have feelings, but I'm functioning as
+expected. How can I assist you today?<|endoftext|>
+Or with new lines expanded:
+<|user|>
+How are you doing?
+<|assistant|>
+I'm just a computer program, so I don't have feelings, but I'm
+functioning as expected. How can I assist you today?<|endoftext|>
+It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
+    System prompt
+In Ai2 demos, we use this system prompt by default:
+You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
+The model has not been trained with a specific system prompt in mind.
+    Bias, Risks, and Limitations
+The Tülu3 models have limited safety training, but are not deployed
+automatically with in-the-loop filtering of responses like ChatGPT, so
+the model can produce problematic outputs (especially when prompted to
+do so).
+It is also unknown what the size and composition of the corpus was used
+to train the base Llama 3.1 models, however it is likely to have
+included a mix of Web data and technical sources like books and code.
+See the Falcon 180B model card for an example of this.
+Hyperparamters
+PPO settings for RLVR:
+Learning Rate: 3 × 10⁻⁷
+Discount Factor (gamma): 1.0
+General Advantage Estimation (lambda): 0.95
+Mini-batches (N_mb): 1
+PPO Update Iterations (K): 4
+PPO's Clipping Coefficient (epsilon): 0.2
+Value Function Coefficient (c1): 0.1
+Gradient Norm Threshold: 1.0
+Learning Rate Schedule: Linear
+Generation Temperature: 1.0
+Batch Size (effective): 512
+Max Token Length: 2,048
+Max Prompt Token Length: 2,048
+Penalty Reward Value for Responses without an EOS Token: -10.0
+Response Length: 1,024 (but 2,048 for MATH)
+Total Episodes: 100,000
+KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
+Warm up ratio (omega): 0.0
+    License and use
+All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement.
+Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
+Tülu3 is intended for research and educational use.
+For more information, please see our Responsible Use Guidelines.
+The models have been fine-tuned using a dataset mix with outputs
+generated from third party models and are subject to additional terms:
+Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
+    Citation
+If Tülu3 or any of the related materials were helpful to your work, please cite:
+@article{lambert2024tulu3,
+  title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
+  author = {
+    Nathan Lambert and
+    Jacob Morrison and
+    Valentina Pyatkin and
+    Shengyi Huang and
+    Hamish Ivison and
+    Faeze Brahman and
+    Lester James V. Miranda and
+    Alisa Liu and
+    Nouha Dziri and
+    Shane Lyu and
+    Yuling Gu and
+    Saumya Malik and
+    Victoria Graf and
+    Jena D. Hwang and
+    Jiangjiang Yang and
+    Ronan Le Bras and
+    Oyvind Tafjord and
+    Chris Wilhelm and
+    Luca Soldaini and
+    Noah A. Smith and
+    Yizhong Wang and
+    Pradeep Dasigi and
+    Hannaneh Hajishirzi
+  },
+  year = {2024},
+  email = {tulu@allenai.org}
+}
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)