teknium/Mistral-Trismegistus-7B · Excellent model ! Asking about training details

As title.
What are the code you use to train this model ?
I noticed it's qlora. and have viewed the wandb records.
I am doing myself qlora fine tuning but facing train loss unstable issue

What are the target_modules you use ?
What are the tokenizer parameter you use ?

My setting. which fail
Tokenizer initialization

tokenizer = AutoTokenizer.from_pretrained(
    f"{path_to_save}/tokenizer",
    model_max_length=512,
    padding_side="left",
    trust_remote_code=True,
    add_eos_token=True, 
    )

TOKENIZATION

    tokenized_full_prompt = tokenizer(full_prompt,
        truncation=True,
        max_length=512 , 
        padding=True, 
        return_tensors="pt")

---> in the trainer i use
data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
which will dynamically pad for my sequence

LORA

config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
...

Thannnnks a lot !