TheBloke/OpenHermes-2.5-neural-chat-7B-v3-1-7B-GPTQ · Training with Lora producing wacky output

Hi @TheBloke . Firstly, thanks for incredible work! Much appreciated from me and I'm sure the rest of the community.

Onto training...
In our tests we get about 80% accuracy with this model for our classification task without training. We set out to train with the configuration below and inspired by this guide and the prompt you suggest in the model card.
However, after training the model:

Inference takes a way long time
The output is not at all like the good classification labels we get without training. With training we get parts of the prompt template, way long output, etc, in the output.

We would like to know if you had any ideas or suggestions on how we can train and get an improved result, not a worse result like we are now.

Here's the config

  tokenizer = AutoTokenizer.from_pretrained(model_id)
  tokenizer.pad_token = tokenizer.eos_token
  quantization_config_loading = GPTQConfig(
    bits=4,
    disable_exllama=True,
    group_size=128,
    desc_act=False,
    dataset="c4",
    tokenizer=tokenizer,
  )
  model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config_loading,
    device_map="auto",
  )

  model.gradient_checkpointing_enable()
  model = prepare_model_for_kbit_training(model)

  config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["k_proj","o_proj","q_proj","v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
  )

  model = get_peft_model(model, config)

  args = TrainingArguments(
    output_dir="checkpoints",
    overwrite_output_dir=True,
    evaluation_strategy="steps",
    optim="adamw_torch",
    weight_decay=weight_decay,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=eval_batch_size,
    learning_rate=learning_rate,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=enable_gradient_checkpointing,
  )
  args = args.set_lr_scheduler(
    name=decay,
    warmup_ratio=warmup_fraction,
    num_epochs=num_epochs,
  )
  trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    args=args,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
  )
  trainer.train()

Thank you in advance @TheBloke 🙌🙌