--- base_model: BEE-spoke-data/Meta-Llama-3-8Bee datasets: - BEE-spoke-data/bees-internal inference: true language: - en license: llama3 model-index: - name: Meta-Llama-3-8Bee results: [] model_creator: BEE-spoke-data model_name: Meta-Llama-3-8Bee pipeline_tag: text-generation quantized_by: afrideva tags: - axolotl - generated_from_trainer - gguf - ggml - quantized --- # Meta-Llama-3-8Bee-GGUF Quantized GGUF model files for [Meta-Llama-3-8Bee](https://huggingface.co/BEE-spoke-data/Meta-Llama-3-8Bee) from [BEE-spoke-data](https://huggingface.co/BEE-spoke-data) ## Original Model Card: [

](https://github.com/OpenAccess-AI-Collective/axolotl)

See axolotl config

axolotl version: `0.4.0` ```yaml base_model: meta-llama/Meta-Llama-3-8B model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer strict: false # dataset datasets: - path: BEE-spoke-data/bees-internal type: completion # format from earlier field: text # Optional[str] default: text, field to use for completion data val_set_size: 0.05 sequence_len: 8192 sample_packing: true pad_to_sequence_len: true train_on_inputs: false group_by_length: false # WANDB wandb_project: llama3-8bee wandb_entity: pszemraj wandb_watch: gradients wandb_name: llama3-8bee-8192 hub_model_id: pszemraj/Meta-Llama-3-8Bee hub_strategy: every_save gradient_accumulation_steps: 8 micro_batch_size: 1 num_epochs: 1 optimizer: paged_adamw_32bit lr_scheduler: cosine learning_rate: 2e-5 load_in_8bit: false load_in_4bit: false bf16: auto fp16: tf32: true torch_compile: true # requires >= torch 2.0, may sometimes cause problems torch_compile_backend: inductor # Optional[str] gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: logging_steps: 10 xformers_attention: flash_attention: true warmup_steps: 25 # hyperparams for freq of evals, saving, etc evals_per_epoch: 3 saves_per_epoch: 3 save_safetensors: true save_total_limit: 1 # Checkpoints saved at a time output_dir: ./output-axolotl/output-model-gamma resume_from_checkpoint: deepspeed: weight_decay: 0.0 special_tokens: pad_token: <|end_of_text|> ```

# Meta-Llama-3-8Bee This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the `BEE-spoke-data/bees-internal` dataset (continued pretraining). It achieves the following results on the evaluation set: - Loss: 2.3319 ## Intended uses & limitations - unveiling knowledge about bees and apiary practice - needs further tuning to be used in 'instruct' type settings ## Training and evaluation data 🐝🍯 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 25 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | No log | 0.0 | 1 | 2.5339 | | 2.3719 | 0.33 | 232 | 2.3658 | | 2.2914 | 0.67 | 464 | 2.3319 | ### Framework versions - Transformers 4.40.0.dev0 - Pytorch 2.3.0+cu118 - Datasets 2.15.0 - Tokenizers 0.15.0