metadata
license: apache-2.0
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: mistralai/Mistral-7B-Instruct-v0.1
model-index:
- name: witness_reliability_run1_merged
results: []
witness_reliability_run1_merged
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on the latest of labeled dataset(https://git.enigmalabs.io/data-science-playground/model-data/-/tree/master/models/witness_reliability?ref_type=heads).
Model description
More information needed
Intended uses & limitations
Usage
merged_model_name = "e-labs/witness_reliability_ft_mistral_7b_v0.1_instruct"
task_type = 'CAUSAL_LM'
tokenizer = AutoTokenizer.from_pretrained(merged_model_name)
model = AutoModelForCausalLM.from_pretrained(merged_model_name)
pipe = pipeline(task_type, model=model, tokenizer=tokenizer, temperature=0.0)
result = pipe(prompt, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)
answer = result[0]['generated_text'][len(prompt):].strip()
Map the answer
as
answer | inference |
---|---|
a | average |
question | questionable |
re | reliable |
second | second-hand |
all else | average |
Since the model is fundamentally a LLM, it might generate texts that are not in the defined set of values ['a', 'question', 're', 'second']
.
In those cases, default to average
, as indicated by the "all else" in the table above.
Training and evaluation data
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3
Training results
Accuracy Metrics
- Accuracy: 0.958
- Accuracy for label questionable: 1.000
- Accuracy for label second: 0.941
- Accuracy for label reliable: 0.958
- Accuracy for label average: 0.933
Classification Report:
label | precision | recall | f1-score | support |
---|---|---|---|---|
average | 0.97 | 0.93 | 0.95 | 30 |
none | 0.00 | 0.00 | 0.00 | 0 |
questionable | 0.97 | 1.00 | 0.98 | 30 |
reliable | 0.92 | 0.96 | 0.94 | 24 |
second | 1.00 | 0.94 | 0.97 | 34 |
accuracy | 0.96 | 118 | ||
macro avg | 0.77 | 0.77 | 0.77 | 118 |
weighted avg | 0.97 | 0.96 | 0.96 | 118 |
Framework versions
- PEFT 0.7.2.dev0
- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1