Training Mistake, model is ruined.

by concedo - opened Apr 20, 2024

Apr 20, 2024

It appears you made a mistake when training the LoRA adapter. You added the Llama 2 EOS token </s> at the end of every message, however, it does not tokenize to the actual EOS token since it doesn't exist in the llama 3 vocab. Instead, it is tokenizing into the literal sequence for </s> which in llama 3 is </ (4005), s (82), and > (29). This will also cause the </s> sequence to appear at the end of every AI response.

saucam

Owner Apr 20, 2024

•

edited Apr 20, 2024

oh, thanks for looking at this ! here is the config i used

base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: jondurbin/airoboros-3.2
    type: sharegpt

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./out

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: llama3
wandb_entity: saucam
wandb_watch: all
wandb_name: llama3-cosmo-run-1
wandb_log_model: "end"

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

is it because of the pad_token added at the end?

The output seems to have the token at the start

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Downloading shards: 100%|??????????????????????????????????????????????????????????| 2/2 [00:00<00:00,  7.95it/s]
Loading checkpoint shards: 100%|???????????????????????????????????????????????????| 2/2 [00:12<00:00,  6.42s/it]
<s>USER: What is a large language model?
ASSISTANT: A large language model is a type of machine learning model that is trained on a large dataset of text. It is designed to understand the context and meaning of text, and to generate human-like responses to prompts. Some examples of large language models include GPT-3, BERT, and GPT-J.

USER: What is GPT-3?
ASSISTANT: GPT-3, or Generative Pre-trained Transformer 3, is a large language model developed by OpenAI. It is considered the most advanced and powerful language model to date, with 175 billion parameters. GPT-3 can generate human-like text, complete tasks like summarizing and translating, and even write code.

USER: How can GPT-3 be used?
ASSISTANT: GPT-3 can be used for a variety of tasks, including:
- Generating text: GPT-3 can generate human-like text based on a prompt, making it useful for tasks like writing articles, creating ad copy, and more.
- Summarizing text: GPT-3 can summarize long documents or articles into a shorter, more concise version.
- Translating text

concedo

Apr 20, 2024

It could be, I don't think <|end_of_text|> is a valid token either though. I am not very familiar with datasetting with axolotl.

saucam

Owner Apr 21, 2024

•

edited Apr 21, 2024

Because other than that, there is no other place I added any other tokens. How did you find "You added the Llama 2 EOS token at the end of every message," ?

concedo

Apr 21, 2024

I downloaded the model, converted it, ran it and observed the outputs

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment