Hi what did you train this model with, and what were hyperparams?

#1
by teknium - opened

question in title

I guess it is the same procedure as other Synthia-s,
but the base model is Mistral, not llama-2

see this thread https://www.reddit.com/r/LocalLLaMA/comments/16ur16s/synthia7bv13_trained_on_the_mistral7b_base/

Hey,
This was trained with QLoRA, as with all my models. Learning rate was 3e-4, 4096 context length. Batch size was 64, trained on a single H100.
Synthia-v1.2 dataset, which contain Chain-of-Thought (Orca), Tree-of-Thought and Long-Form conversation data.
Dataset is super high quality, and not a massive dataset (about ~125K samples).
That’s all I can say for now.
Migel

migtissera changed discussion status to closed

Hey,
This was trained with QLoRA, as with all my models. Learning rate was 3e-4, 4096 context length. Batch size was 64, trained on a single H100.
Synthia-v1.2 dataset, which contain Chain-of-Thought (Orca), Tree-of-Thought and Long-Form conversation data.
Dataset is super high quality, and not a massive dataset (about ~125K samples).
That’s all I can say for now.
Migel

Did you use artidoro's qlora.py repo, or something else? Thanks!

Hello,
Thanks for all your hard work!
I would like to know what are the target_modules you use?

I use the following config to QLORA with mistral base but it show training loss instability and eventually train loss become zero and eval loss is NaN.
Is it related to target_modules ?

config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

I've been fine tuning with just q, k , v, o with no instability issues. I ran into the same issues when trying to use any of the other modules.

Thanks reply ! I am now testing with just q k v... seems stable so far, will follow up on it.
meanwhile, I examine the original artidoro qlora github. In qlora.py, there exist snippet of

def find_all_linear_names(args, model):
    cls = bnb.nn.Linear4bit if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if 'lm_head' in lora_module_names: # needed for 16-bit
        lora_module_names.remove('lm_head')
    return list(lora_module_names)

not sure if lora_module_names.remove('lm_head') have something to do with what we observe here ?

Hey, I don't have anything to add here. I attach LoRA's to all layers, but as you're suggesting the QLoRA repo may remove lm_head.

Sign up or log in to comment