tlab-admin commited on
Commit
49955d1
·
verified ·
1 Parent(s): 7733d1d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +222 -0
README.md ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ ---
3
+ license: other
4
+ license_name: trillion
5
+ license_link: LICENSE
6
+ tags:
7
+ - finetuned
8
+ - chat
9
+ language:
10
+ - en
11
+ - ko
12
+ - ja
13
+ pipeline_tag: text-generation
14
+ library_name: transformers
15
+ extra_gated_prompt: >-
16
+ **TRILLION LABS AI MODEL LICENSE AGREEMENT**
17
+ Tri- Model Series Version Effective Date: February 1, 2025
18
+
19
+ "**Agreement**" means the terms and conditions for use, reproduction, distribution and modification of the Trillion Labs AI Model series set forth herein.
20
+
21
+ "**Documentation**" means the specifications, manuals and documentation accompanying the Tri- Model series distributed by Trillion Labs.
22
+
23
+ "**Licensee**" or "**you**" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
24
+
25
+ "**Model**" means the artificial intelligence model series provided by Licensor ("Tri-" series), including software, algorithms, machine learning models, and related components provided by Licensor, including all updates, enhancements, improvements, bug fixes, patches, or other modifications.
26
+
27
+ "**Trillion Labs**" or "**we**" means Trillion Labs, the owner, developer, and provider of the Model, holding all rights, title, and interest in the Model.
28
+
29
+ By clicking "I Accept" below or by using or distributing any portion or element of the Model, you agree to be bound by this Agreement.
30
+
31
+ 1\. **License Grant and Redistribution**.
32
+
33
+ a. Grant of Rights. You are granted a limited, non-exclusive, non-transferable, worldwide, revocable license under Trillion Labs' intellectual property or other rights to use, reproduce, distribute, and make modifications to the Model for research purposes.
34
+
35
+ b. Redistribution and Use.
36
+
37
+ i. If you distribute or make available the Model (or any derivative works thereof), or a product or service that contains any of them, you shall (A) provide a copy of this Agreement with any such Model; and (B) prominently display "Built with Tri-" on a related website, user interface, blogpost, about page, or product documentation. If you use the Model to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include "Tri-" followed by the original Model version at the beginning of any such AI model name.
38
+
39
+ ii. You must retain in all copies of the Model that you distribute the following attribution notice within a "Notice" text file distributed as a part of such copies: "Tri- Model Series is licensed under the Trillion Labs AI Model License Agreement, Copyright © Trillion Labs. All Rights Reserved."
40
+
41
+ iii. Your use of the Model must comply with applicable laws and regulations (including trade compliance laws and regulations).
42
+
43
+ 2\. **Additional Commercial Terms**. If the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 1 million monthly active users OR Annual Recurring Revenue is greater than $10 million USD, you must request a commercial license from Trillion Labs, and you are not authorized to exercise any commercial rights under this Agreement unless or until Trillion Labs otherwise expressly grants you such rights.
44
+
45
+ 3\. **Disclaimer of Warranty**. THE MODEL, DERIVATIVES, AND OUTPUT ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, AND TRILLION LABS DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
46
+
47
+ 4\. **Limitation of Liability**. IN NO EVENT WILL TRILLION LABS BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES.
48
+
49
+ 5\. **Intellectual Property**.
50
+
51
+ a. No trademark licenses are granted under this Agreement, and in connection with the Model, neither Trillion Labs nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Model or as set forth in this Section 5(a).
52
+
53
+ b. All rights, title, and interest in the Model, including modifications, Derivatives, and documentation, remain exclusively with Trillion Labs.
54
+
55
+ 6\. **Term and Termination**. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Model and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Trillion Labs may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Model. Sections 3, 4 and 5 shall survive the termination of this Agreement.
56
+
57
+ 7\. **Governing Law and Jurisdiction**. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
58
+ extra_gated_fields:
59
+ First Name: text
60
+ Last Name: text
61
+ Date of birth: date_picker
62
+ Country: country
63
+ Affiliation: text
64
+ Job title:
65
+ type: select
66
+ options:
67
+ - Student
68
+ - Research Graduate
69
+ - AI researcher
70
+ - AI developer/engineer
71
+ - Reporter
72
+ - Other
73
+ geo: ip_location
74
+ By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Trillion Labs Privacy Policy: checkbox
75
+ extra_gated_description: >-
76
+ The information you provide will be collected, stored, processed and shared in
77
+ accordance with the Trillion Labs Privacy Policy.
78
+ extra_gated_button_content: Submit
79
+ extra_gated_heading: "Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate."
80
+ ---
81
+
82
+
83
+ # Tri-70B-preview-SFT
84
+
85
+ ## Introduction
86
+
87
+ We introduce **Tri-70B-preview-SFT**, a research preview of our latest and largest flagship language model that redefines the efficiency frontier in LLM training. By achieving frontier performance for it's compute size (1.5T training tokens from scratch), we demonstrate that exceptional capabilities don't require excessive computational resources.
88
+
89
+ We are releasing a **minimally post-trained version** to enable open research and community experimentation. This preview version has only undergone supervised fine-tuning and has not been subjected to extensive RLHF. This enables researchers to explore various RL-based alignment techniques with this model. Stay tuned for the base model release coming soon!
90
+
91
+ ### Key Highlights
92
+
93
+ - **Architecture optimized for long context**
94
+ - 32k context window
95
+ - Sliding window attention with window size 4096
96
+ - iRoPE: Interleaved local (RoPE) and global (temperature-scaled) attention
97
+ - Scalable softmax
98
+ - **Multi-lingual capabilities**: Specially optimized for English, Korean, and Japanese
99
+ - **Enhanced reasoning**: Modified training dataset mixture specifically designed for reasoning capabilities, with emphasis on step-by-step problem solving
100
+ - **Minimal post-training**: This preview release features only supervised fine-tuning, enabling researchers to explore custom alignment techniques and RLHF/RLVR approaches
101
+
102
+ ### Model Specifications
103
+
104
+ #### Tri-70B-preview-SFT
105
+
106
+ | Specification | Value |
107
+ |--------------|-------|
108
+ | Type | Causal Language Model |
109
+ | Training Stage | Pre-training & Supervised Fine-Tuning |
110
+ | Architecture | Transformer Decoder with iRoPE (global attention frequency of 4), SwiGLU, RMSNorm, and GQA |
111
+ | Number of Parameters | 70B |
112
+ | Number of Layers | 80 |
113
+ | Number of Attention Heads | 64 (Query) / 8 (Key, Value) |
114
+ | Context Length | 32,768 |
115
+ | Number of Tokens Seen | 1.5T |
116
+ | Vocab Size | 124,416 |
117
+
118
+ ## Quickstart
119
+
120
+ Here is a code snippet with `apply_chat_template` that demonstrates how to load the tokenizer and model and generate text:
121
+
122
+ ### Tri-70B-SFT Usage
123
+ ```python
124
+ import torch
125
+ from transformers import AutoModelForCausalLM, AutoTokenizer
126
+
127
+ model_name = "trillionlabs/Tri-70B-preview-SFT"
128
+
129
+ model = AutoModelForCausalLM.from_pretrained(
130
+ model_name,
131
+ torch_dtype=torch.bfloat16,
132
+ device_map="auto"
133
+ )
134
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
135
+
136
+ prompt = "Explain the concept of central limit theorem in simple terms."
137
+ messages = [
138
+ {"role": "user", "content": prompt}
139
+ ]
140
+ text = tokenizer.apply_chat_template(
141
+ messages,
142
+ tokenize=False,
143
+ add_generation_prompt=True
144
+ )
145
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
146
+
147
+ generated_ids = model.generate(
148
+ **model_inputs,
149
+ max_new_tokens=512
150
+ )
151
+ generated_ids = [
152
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
153
+ ]
154
+
155
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
156
+ print(response)
157
+ ```
158
+
159
+ ### vLLM and SGLang Deployment
160
+ We plan to support Tri-70B-preview-SFT in vLLM and SGLang soon. Stay tuned for updates!
161
+
162
+ ## Evaluation
163
+
164
+ We evaluated Tri-70B-preview-SFT across a suite of benchmarks assessing general reasoning, knowledge recall, coding abilities, mathematical reasoning, and instruction-following capabilities. We compare our model against state-of-the-art models of similar scale: Qwen-2.5-72B-instruct and Llama-3.1-70B.
165
+
166
+ <details>
167
+ <summary> Full evaluation settings </summary>
168
+
169
+ # Benchmark Evaluation Settings
170
+
171
+ | Benchmark | Language | Evaluation Setting | Metric |
172
+ |:----------|:---------|:------------------|:-------|
173
+ | • HAERAE | Korean | 3-shot | accuracy |
174
+ | • KMMLU | Korean | 0-shot, CoT | accuracy (exact-match) |
175
+ | • MMLU | English | 0-shot, CoT | accuracy (exact-match) |
176
+ | • MMLU-Pro | English | 0-shot, CoT | exact-match |
177
+ | • HumanEval | English | 0-shot | pass@1 |
178
+ | • MBPPPlus | English | 0-shot | pass@1 |
179
+ | • GSM8k | English | 0-shot, CoT | exact-match |
180
+ | • MATH | English | 0-shot, CoT | exact-match |
181
+ | • GPQA Diamond | English | 0-shot, CoT | accuracy |
182
+ | • HRM8k | Korean | 0-shot, CoT | exact-match |
183
+ | • MT-Bench | English | LLM-as-a-judge (gpt-4o) | LLM score |
184
+
185
+ **Note that MT-Bench uses a 10-point scale.
186
+
187
+ </details>
188
+
189
+ ### Benchmark Results
190
+
191
+ Models compared:
192
+
193
+ - **Tri-70B-preview-SFT**: Our flagship 70B parameter model
194
+ - **Qwen-2.5-72B-instruct**: Qwen's 72B parameter instruction-tuned model
195
+ - **Llama-3.1-70B**: Meta's instruction-tuned 70B model
196
+
197
+
198
+ | Benchmark | Tri-70B-SFT | Qwen-2.5-72B-instruct | Llama-3.1-70B |
199
+ | --- | --- | --- | --- |
200
+ | HAERAE | 83.96 | 75.44 | 78.09 |
201
+ | KMMLU | 62.38 | 65.07 | 54.62 |
202
+ | MMLU | 74.42 | 87.29 | 85.47 |
203
+ | MMLU-Pro | 62.48 | 69.40 | 62.79 |
204
+ | HumanEval | - | 89.02 | 82.93 |
205
+ | MBPPPlus | 68.52 | 88.2 | 84.13 |
206
+ | GSM8k | 87.37 | 91.51 | 72.48 |
207
+ | MATH | 64.40 | 80.80 | 62.40 |
208
+ | GPQA-Diamond | - | 54.04 | 44.44 |
209
+ | HRM8k | 82.26 | 66.24 | 63.90 |
210
+ | MT-Bench | 7.54 | 8.71 | 8.2 |
211
+
212
+ ## Limitations
213
+
214
+ - **Language Support**: The model is optimized for English, Korean, and Japanese. Usage with other languages may result in degraded performance.
215
+ - **Knowledge Cutoff**: The model's knowledge is limited to information available up to February 2025.
216
+ - **Minimal Post-Training**: As this is a supervised fine-tuning (SFT) release without RLHF, responses may occasionally lack the polish and safety alignment of fully post-trained models.
217
+
218
+ ## License
219
+ This model repository is licensed under the Trillion License.
220
+
221
+ ## Contact
222
+ For inquiries, please contact: info@trillionlabs.co