afrideva commited on
Commit
236bb05
1 Parent(s): a82d329

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BEE-spoke-data/Meta-Llama-3-8Bee
3
+ datasets:
4
+ - BEE-spoke-data/bees-internal
5
+ inference: true
6
+ language:
7
+ - en
8
+ license: llama3
9
+ model-index:
10
+ - name: Meta-Llama-3-8Bee
11
+ results: []
12
+ model_creator: BEE-spoke-data
13
+ model_name: Meta-Llama-3-8Bee
14
+ pipeline_tag: text-generation
15
+ quantized_by: afrideva
16
+ tags:
17
+ - axolotl
18
+ - generated_from_trainer
19
+ - gguf
20
+ - ggml
21
+ - quantized
22
+ ---
23
+
24
+ # Meta-Llama-3-8Bee-GGUF
25
+
26
+ Quantized GGUF model files for [Meta-Llama-3-8Bee](https://huggingface.co/BEE-spoke-data/Meta-Llama-3-8Bee) from [BEE-spoke-data](https://huggingface.co/BEE-spoke-data)
27
+
28
+ ## Original Model Card:
29
+
30
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
31
+ should probably proofread and complete it, then remove this comment. -->
32
+
33
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
34
+ <details><summary>See axolotl config</summary>
35
+
36
+ axolotl version: `0.4.0`
37
+ ```yaml
38
+ base_model: meta-llama/Meta-Llama-3-8B
39
+ model_type: LlamaForCausalLM
40
+ tokenizer_type: AutoTokenizer
41
+ strict: false
42
+
43
+ # dataset
44
+ datasets:
45
+ - path: BEE-spoke-data/bees-internal
46
+ type: completion # format from earlier
47
+ field: text # Optional[str] default: text, field to use for completion data
48
+ val_set_size: 0.05
49
+
50
+ sequence_len: 8192
51
+ sample_packing: true
52
+ pad_to_sequence_len: true
53
+ train_on_inputs: false
54
+ group_by_length: false
55
+
56
+ # WANDB
57
+ wandb_project: llama3-8bee
58
+ wandb_entity: pszemraj
59
+ wandb_watch: gradients
60
+ wandb_name: llama3-8bee-8192
61
+ hub_model_id: pszemraj/Meta-Llama-3-8Bee
62
+ hub_strategy: every_save
63
+
64
+ gradient_accumulation_steps: 8
65
+ micro_batch_size: 1
66
+ num_epochs: 1
67
+ optimizer: paged_adamw_32bit
68
+ lr_scheduler: cosine
69
+ learning_rate: 2e-5
70
+
71
+ load_in_8bit: false
72
+ load_in_4bit: false
73
+ bf16: auto
74
+ fp16:
75
+ tf32: true
76
+
77
+ torch_compile: true # requires >= torch 2.0, may sometimes cause problems
78
+ torch_compile_backend: inductor # Optional[str]
79
+ gradient_checkpointing: true
80
+ gradient_checkpointing_kwargs:
81
+ use_reentrant: false
82
+ early_stopping_patience:
83
+ logging_steps: 10
84
+ xformers_attention:
85
+ flash_attention: true
86
+
87
+ warmup_steps: 25
88
+ # hyperparams for freq of evals, saving, etc
89
+ evals_per_epoch: 3
90
+ saves_per_epoch: 3
91
+ save_safetensors: true
92
+ save_total_limit: 1 # Checkpoints saved at a time
93
+ output_dir: ./output-axolotl/output-model-gamma
94
+ resume_from_checkpoint:
95
+
96
+
97
+ deepspeed:
98
+ weight_decay: 0.0
99
+
100
+ special_tokens:
101
+ pad_token: <|end_of_text|>
102
+ ```
103
+
104
+ </details><br>
105
+
106
+ # Meta-Llama-3-8Bee
107
+
108
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the `BEE-spoke-data/bees-internal` dataset (continued pretraining).
109
+ It achieves the following results on the evaluation set:
110
+ - Loss: 2.3319
111
+
112
+ ## Intended uses & limitations
113
+
114
+ - unveiling knowledge about bees and apiary practice
115
+ - needs further tuning to be used in 'instruct' type settings
116
+
117
+ ## Training and evaluation data
118
+
119
+ 🐝🍯
120
+
121
+ ## Training procedure
122
+
123
+ ### Training hyperparameters
124
+
125
+ The following hyperparameters were used during training:
126
+ - learning_rate: 2e-05
127
+ - train_batch_size: 1
128
+ - eval_batch_size: 1
129
+ - seed: 42
130
+ - gradient_accumulation_steps: 8
131
+ - total_train_batch_size: 8
132
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
133
+ - lr_scheduler_type: cosine
134
+ - lr_scheduler_warmup_steps: 25
135
+ - num_epochs: 1
136
+
137
+ ### Training results
138
+
139
+ | Training Loss | Epoch | Step | Validation Loss |
140
+ |:-------------:|:-----:|:----:|:---------------:|
141
+ | No log | 0.0 | 1 | 2.5339 |
142
+ | 2.3719 | 0.33 | 232 | 2.3658 |
143
+ | 2.2914 | 0.67 | 464 | 2.3319 |
144
+
145
+
146
+ ### Framework versions
147
+
148
+ - Transformers 4.40.0.dev0
149
+ - Pytorch 2.3.0+cu118
150
+ - Datasets 2.15.0
151
+ - Tokenizers 0.15.0