See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: meta-llama/Meta-Llama-3-70B
bf16: true
dataset_prepared_path: last_run_prepared
debug: null
deepspeed: null
early_stopping_patience: null
eval_table_size: null
evals_per_epoch: 0
flash_attention: true
fp16: null
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
group_by_length: false
hub_model_id: minionai/llama3-70b-cache-summary-truncated-clicks-dropped-dd-sorry-sure
hub_strategy: all_checkpoints
learning_rate: 1e-4
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 64
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 64
lora_target_linear: true
lora_target_modules: null
lr_scheduler: cosine
micro_batch_size: 1
model_type: LlamaForCausalLM
num_epochs: 3
optimizer: adamw_torch
output_dir: ./lora-out
pad_to_sequence_len: true
resume_from_checkpoint: null
sample_packing: true
wandb_entity: minionai
wandb_name: cache-summary-truncated-clicks-dropped-dd-sorry-dropped-sure
wandb_project: llama3-70b-azure
saves_per_epoch: 1
sequence_len: 8192
special_tokens:
  pad_token: <|end_of_text|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
val_set_size: 0
warmup_steps: 50
weight_decay: 0.0
datasets:
- path: minionai/cache-summary-truncated-clicks-dropped-dd-sorry-dropped-sure_ift
  type: 
      system_prompt: ""
      system_format: "{system}"
      field_system: system
      field_instruction: instruction
      field_input: input
      field_output: output
      format: |-
        User: {instruction} {input}
        Assistant:
      # 'no_input_format' cannot include {input}
      no_input_format: "### System:\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\nverify(\""

llama3-70b-cache-summary-truncated-clicks-dropped-dd-sorry-sure

This model is a fine-tuned version of meta-llama/Meta-Llama-3-70B on the None dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 8
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 3

Training results

Framework versions

PEFT 0.11.1
Transformers 4.42.3
Pytorch 2.1.2+cu118
Datasets 2.19.1
Tokenizers 0.19.1

minionai
/

llama3-70b-cache-summary-truncated-clicks-dropped-dd-sorry-sure_merged

You need to agree to share your contact information to access this model

llama3-70b-cache-summary-truncated-clicks-dropped-dd-sorry-sure

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Evaluation results

You need to agree to share your contact information to access this model

llama3-70b-cache-summary-truncated-clicks-dropped-dd-sorry-sure

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for meta-llama/Meta-Llama-3-70B

Evaluation results

Adapter for