isaaclee's picture
Update README.md
ffa1544 verified
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
base_model: mistralai/Mistral-7B-Instruct-v0.1
model-index:
  - name: witness_reliability_run1_merged
    results: []

witness_reliability_run1_merged

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on the latest of labeled dataset(https://git.enigmalabs.io/data-science-playground/model-data/-/tree/master/models/witness_reliability?ref_type=heads).

Model description

More information needed

Intended uses & limitations

Usage


merged_model_name = "e-labs/witness_reliability_ft_mistral_7b_v0.1_instruct"
task_type = 'CAUSAL_LM'

tokenizer = AutoTokenizer.from_pretrained(merged_model_name)
model = AutoModelForCausalLM.from_pretrained(merged_model_name)

pipe = pipeline(task_type, model=model, tokenizer=tokenizer, temperature=0.0)

result = pipe(prompt, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)

answer = result[0]['generated_text'][len(prompt):].strip()

Map the answer as

answer inference
a average
question questionable
re reliable
second second-hand
all else average

Since the model is fundamentally a LLM, it might generate texts that are not in the defined set of values ['a', 'question', 're', 'second']. In those cases, default to average, as indicated by the "all else" in the table above.

Training and evaluation data

https://wandb.ai/enigmalabs/witness_reliability_ft_mistral_instruct_v0.1/runs/0skl7iac?nw=nwuserisaaclee

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 3

Training results

https://wandb.ai/enigmalabs/witness_reliability_ft_mistral_instruct_v0.1/runs/2etycpye?nw=nwuserisaaclee

Accuracy Metrics

  • Accuracy: 0.958
  • Accuracy for label questionable: 1.000
  • Accuracy for label second: 0.941
  • Accuracy for label reliable: 0.958
  • Accuracy for label average: 0.933

Classification Report:

label precision recall f1-score support
average 0.97 0.93 0.95 30
none 0.00 0.00 0.00 0
questionable 0.97 1.00 0.98 30
reliable 0.92 0.96 0.94 24
second 1.00 0.94 0.97 34
accuracy 0.96 118
macro avg 0.77 0.77 0.77 118
weighted avg 0.97 0.96 0.96 118

Framework versions

  • PEFT 0.7.2.dev0
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1