Training Mistake, model is ruined.

#1
by concedo - opened

It appears you made a mistake when training the LoRA adapter. You added the Llama 2 EOS token </s> at the end of every message, however, it does not tokenize to the actual EOS token since it doesn't exist in the llama 3 vocab. Instead, it is tokenizing into the literal sequence for </s> which in llama 3 is </ (4005), s (82), and > (29). This will also cause the </s> sequence to appear at the end of every AI response.

oh, thanks for looking at this ! here is the config i used

base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: jondurbin/airoboros-3.2
    type: sharegpt

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./out

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: llama3
wandb_entity: saucam
wandb_watch: all
wandb_name: llama3-cosmo-run-1
wandb_log_model: "end"

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

is it because of the pad_token added at the end?

The output seems to have the token at the start

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Downloading shards: 100%|??????????????????????????????????????????????????????????| 2/2 [00:00<00:00,  7.95it/s]
Loading checkpoint shards: 100%|???????????????????????????????????????????????????| 2/2 [00:12<00:00,  6.42s/it]
<s>USER: What is a large language model?
ASSISTANT: A large language model is a type of machine learning model that is trained on a large dataset of text. It is designed to understand the context and meaning of text, and to generate human-like responses to prompts. Some examples of large language models include GPT-3, BERT, and GPT-J.

USER: What is GPT-3?
ASSISTANT: GPT-3, or Generative Pre-trained Transformer 3, is a large language model developed by OpenAI. It is considered the most advanced and powerful language model to date, with 175 billion parameters. GPT-3 can generate human-like text, complete tasks like summarizing and translating, and even write code.

USER: How can GPT-3 be used?
ASSISTANT: GPT-3 can be used for a variety of tasks, including:
- Generating text: GPT-3 can generate human-like text based on a prompt, making it useful for tasks like writing articles, creating ad copy, and more.
- Summarizing text: GPT-3 can summarize long documents or articles into a shorter, more concise version.
- Translating text

It could be, I don't think <|end_of_text|> is a valid token either though. I am not very familiar with datasetting with axolotl.

Because other than that, there is no other place I added any other tokens. How did you find "You added the Llama 2 EOS token at the end of every message," ?

I downloaded the model, converted it, ran it and observed the outputs

Sign up or log in to comment