Update README.md
Browse files
README.md
CHANGED
@@ -108,30 +108,29 @@ model-index:
|
|
108 |
|
109 |
|
110 |
|
111 |
-
## Introduction
|
112 |
-
|
113 |
-
N3N_gemma-2-9b-it_20241029_1532 is a 10.2 billion parameter open-source model built upon Gemma2-9B-Instruct through additional training. What sets this model apart is its fine-tuning process using a high-quality dataset derived from 1.6 million arXiv papers.
|
114 |
-
|
115 |
-
- **High-quality Dataset**: The model has been fine-tuned using a comprehensive dataset compiled from 1.6 million arXiv papers, ensuring robust performance across various real-world applications.
|
116 |
|
117 |
-
- **Superior Reasoning Capabilities**: The model demonstrates exceptional performance in mathematical reasoning and complex problem-solving tasks, outperforming comparable models in these areas.
|
118 |
-
|
119 |
-
This model represents our commitment to advancing language model capabilities through meticulous dataset preparation and continuous model enhancement.
|
120 |
-
|
121 |
-
---
|
122 |
|
|
|
123 |
|
124 |
-
|
|
|
|
|
|
|
|
|
|
|
125 |
|
126 |
-
|
127 |
-
- **Finetuned from model :** unsloth/gemma-2-9b-it
|
128 |
|
129 |
-
|
130 |
|
131 |
-
|
|
|
132 |
|
|
|
|
|
|
|
133 |
|
134 |
-
|
135 |
|
136 |
|
137 |
|
@@ -180,25 +179,25 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
180 |
|
181 |
|
182 |
|
183 |
-
gemma-2-9b-it_24184_20241029_1532_3232_cosine_3_50True_8_645e-05
|
184 |
-
|
185 |
|
186 |
-
## Training hyperparameters
|
187 |
-
|
188 |
-
The following hyperparameters were used during training:
|
189 |
-
- seed: 3407
|
190 |
-
- warmup_steps: 50
|
191 |
-
- total_train_batch_size: 512
|
192 |
-
- total_eval_batch_size: 64
|
193 |
-
- learning_rate: 5e-05
|
194 |
-
- optimizer: adamw_8bit
|
195 |
-
- lr_scheduler_type: cosine
|
196 |
-
- num_epochs: 3
|
197 |
-
- r: 32
|
198 |
-
- lora_alpha: 32
|
199 |
-
- rs_lora: True
|
200 |
-
- weight_decay: 0.01
|
201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
202 |
|
203 |
|
204 |
|
@@ -218,15 +217,20 @@ The following hyperparameters were used during training:
|
|
218 |
|
219 |
|
220 |
|
|
|
|
|
|
|
221 |
|
|
|
|
|
|
|
222 |
|
223 |
-
|
224 |
-
If you are interested in customized LLMs for business applications powered by Jikji Labs' advanced infrastructure, we’d love to hear from you! Whether you have feedback, suggestions, or just want to explore collaboration opportunities, we are here to help. Please visit [our website](https://www.n3n.ai/) for more details. Jikji Labs specializes in large-scale data processing and tailored model training solutions to meet your business needs. We value your insights as we strive for continuous improvement and innovation. Your partnership and input are what help drive our mission forward!
|
225 |
|
226 |
-
|
227 |
We are actively seeking support and investment to further our development of robust language models, with a focus on building high-quality and specialized datasets to cater to a wide range of applications. Our expertise in dataset generation enables us to create models that are precise and adaptable to specific business requirements. If you are excited by the opportunity to collaborate and navigate future challenges with us, please visit [our website](https://www.n3n.ai/) for more information.
|
228 |
|
229 |
|
230 |
## Acknowledgement
|
231 |
-
|
232 |
|
|
|
108 |
|
109 |
|
110 |
|
|
|
|
|
|
|
|
|
|
|
111 |
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
+
# N3N_gemma-2-9b-it_20241029_1532
|
114 |
|
115 |
+
## Model Overview
|
116 |
+
- **Base Model**: unsloth/gemma-2-9b-it
|
117 |
+
- **License**: apache-2.0
|
118 |
+
- **Parameters**: 10.2B
|
119 |
+
- **Language**: English
|
120 |
+
- **Training Framework**: [Unsloth](https://github.com/unslothai/unsloth) + Huggingface TRL
|
121 |
|
122 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
123 |
|
124 |
+
> **Achievement**: #1 Ranking for 9B and 12B LLMs (November 8, 2024)
|
125 |
|
126 |
+
## Introduction
|
127 |
+
N3N_gemma-2-9b-it_20241029_1532 is a 10.2B parameter open-source model built upon Gemma2-9B-Instruct through additional training. What sets this model apart is its fine-tuning process using a high-quality dataset derived from 1.6 million arXiv papers.
|
128 |
|
129 |
+
### Key Features
|
130 |
+
- **High-quality Dataset**: The model has been fine-tuned using a comprehensive dataset compiled from 1.6 million arXiv papers, ensuring robust performance across various real-world applications.
|
131 |
+
- **Superior Reasoning**: The model demonstrates exceptional performance in mathematical reasoning and complex problem-solving tasks, outperforming comparable models in these areas.
|
132 |
|
133 |
+
This model represents our commitment to advancing language model capabilities through meticulous dataset preparation and continuous model enhancement.
|
134 |
|
135 |
|
136 |
|
|
|
179 |
|
180 |
|
181 |
|
|
|
|
|
182 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
183 |
|
184 |
+
## Training Details
|
185 |
+
### Hyperparameters
|
186 |
+
```python
|
187 |
+
{
|
188 |
+
"seed": 3407,
|
189 |
+
"warmup_steps": 50,
|
190 |
+
"total_train_batch_size": 512,
|
191 |
+
"total_eval_batch_size": 64,
|
192 |
+
"learning_rate": 5e-05,
|
193 |
+
"optimizer": "adamw_8bit",
|
194 |
+
"lr_scheduler_type": "cosine",
|
195 |
+
"num_epochs": 3,
|
196 |
+
"r": 32,
|
197 |
+
"lora_alpha": 32,
|
198 |
+
"rs_lora": True,
|
199 |
+
"weight_decay": 0.01
|
200 |
+
}
|
201 |
|
202 |
|
203 |
|
|
|
217 |
|
218 |
|
219 |
|
220 |
+
## Business & Collaboration
|
221 |
+
### Contact
|
222 |
+
Are you looking for customized LLMs tailored to your business needs? Jikji Labs offers advanced infrastructure including H100*8 GPU clusters for optimal model training and deployment. Our expertise spans:
|
223 |
|
224 |
+
- Large-scale data processing
|
225 |
+
- High-performance GPU computing
|
226 |
+
- Custom model development and training
|
227 |
|
228 |
+
We welcome collaborations and are always eager to hear your feedback or discuss potential partnerships. Visit our website to learn how our infrastructure and expertise can drive your AI initiatives forward.
|
|
|
229 |
|
230 |
+
### Collaborations
|
231 |
We are actively seeking support and investment to further our development of robust language models, with a focus on building high-quality and specialized datasets to cater to a wide range of applications. Our expertise in dataset generation enables us to create models that are precise and adaptable to specific business requirements. If you are excited by the opportunity to collaborate and navigate future challenges with us, please visit [our website](https://www.n3n.ai/) for more information.
|
232 |
|
233 |
|
234 |
## Acknowledgement
|
235 |
+
Special thanks to [google](https://huggingface.co/google) for providing the base model to the Open-Source community.
|
236 |
|