QuantFactory
/

Teleut-7b-GGUF

Transformers

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

aashish1904 commited on 25 days ago

Commit

ff0d765

•

1 Parent(s): 1ad7be5

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +174 -0

README.md ADDED Viewed

	@@ -0,0 +1,174 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B
+datasets:
+- allenai/tulu-3-sft-mixture
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/Teleut-7b-GGUF
+This is quantized version of [allura-org/Teleut-7b](https://huggingface.co/allura-org/Teleut-7b) created using llama.cpp
+# Original Model Card
+# Teleut 7b
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/UqIi8eztdptvt52Mak_1K.png)
+A replication attempt of Tulu 3 on the Qwen 2.5 base models.
+## Evals (so far)
+|                         | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B (reported) | Mistral 7B v0.3 (reported)
+|-------------------------|----------------------|--------------------------|---------------------------------|-------------------------|---------------------------
+|BBH (3 shot, CoT)        |*64.4%*               |**67.9%**                 |21.7%                            |56.2%                    |47.0%<sup>NLL</sup>
+|GSM8K (8 shot, CoT)      |78.5%                 |76.2%                     |**83.8%**                        |*80.0%*                  |xx.x%
+|IFEval (prompt loose)    |66.3%                 |*72.8%*                   |**74.7%**                        |56.4%                    |53.0%
+|MMLU (0 shot, CoT)       |*73.2%*               |65.9%                     |**76.6%**                        |68.5%                    |30.7%<sup>5-shot</sup>
+|MMLU Pro (0 shot, CoT)   |*48.3%*               |44.3%                     |**56.3%**<sup>Unknown</sup>      |32.9%<sup>5-shot</sup>   |30.7%<sup>5-shot</sup>
+|PopQA (15 shot)          |18.9%                 |**29.3%**                 |18.1%                            |*20.2%*                  |xx.x%
+|TruthfulQA               |47.2%                 |46.8%                     |**63.1%**                        |*55.5%*                  |xx.x%
+## Credits
+Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
+Another big thanks to AllenAI for publishing the Tülu 3 data and model series (as well as the paper and details on training), as well as Alibaba for training the original Qwen 2.5 base model series!
+```
+@article{lambert2024tulu3,
+  title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
+  author = {
+    Nathan Lambert and
+    Jacob Morrison and
+    Valentina Pyatkin and
+    Shengyi Huang and
+    Hamish Ivison and
+    Faeze Brahman and
+    Lester James V. Miranda and
+    Alisa Liu and
+    Nouha Dziri and
+    Shane Lyu and
+    Yuling Gu and
+    Saumya Malik and
+    Victoria Graf and
+    Jena D. Hwang and
+    Jiangjiang Yang and
+    Ronan Le Bras and
+    Oyvind Tafjord and
+    Chris Wilhelm and
+    Luca Soldaini and
+    Noah A. Smith and
+    Yizhong Wang and
+    Pradeep Dasigi and
+    Hannaneh Hajishirzi
+  },
+  year = {2024},
+  email = {tulu@allenai.org}
+}
+```
+## Training procedure
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3.5e-06
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- total_eval_batch_size: 64
+- optimizer: Use paged_ademamix_8bit and the args are:
+No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 370
+- num_epochs: 1
+### Framework versions
+- Transformers 4.46.3
+- Pytorch 2.5.1+cu124
+- Datasets 3.1.0
+- Tokenizers 0.20.3
+### Configuration
+<details><summary>See axolotl config</summary>
+axolotl version: `0.5.2`
+```yaml
+base_model: Qwen/Qwen2.5-7B
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+liger_fused_linear_cross_entropy: true
+strict: false
+chat_template: chatml
+datasets:
+  - path: allenai/tulu-3-sft-mixture
+    type: chat_template
+    split: train
+    field_messages: messages
+dataset_prepared_path: last_run_prepared
+#val_set_size: 0.02
+output_dir: ./ckpts
+sequence_len: 8192
+#sample_packing: true
+pad_to_sequence_len: true
+wandb_project: qwen-2.5-7b-sft
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+gradient_accumulation_steps: 2
+micro_batch_size: 8
+num_epochs: 1
+optimizer: paged_ademamix_8bit
+lr_scheduler: cosine
+learning_rate: 3.5e-6
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+deepspeed: deepspeed_configs/zero3_bf16.json
+warmup_steps: 370
+#evals_per_epoch: 4
+eval_table_size:
+saves_per_epoch: 2
+debug:
+weight_decay: 0.0
+```
+</details><br>