--- license: apache-2.0 library_name: peft tags: - trl - sft - generated_from_trainer base_model: mistralai/Mistral-7B-Instruct-v0.1 model-index: - name: witness_reliability_run1_merged results: [] --- # witness_reliability_run1_merged This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on the latest of labeled dataset(https://git.enigmalabs.io/data-science-playground/model-data/-/tree/master/models/witness_reliability?ref_type=heads). ## Model description More information needed ## Intended uses & limitations Usage ``` merged_model_name = "e-labs/witness_reliability_ft_mistral_7b_v0.1_instruct" task_type = 'CAUSAL_LM' tokenizer = AutoTokenizer.from_pretrained(merged_model_name) model = AutoModelForCausalLM.from_pretrained(merged_model_name) pipe = pipeline(task_type, model=model, tokenizer=tokenizer, temperature=0.0) result = pipe(prompt, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id) answer = result[0]['generated_text'][len(prompt):].strip() ``` Map the `answer` as | answer | inference | |----------|--------------| | a | average | | question | questionable | | re | reliable | | second | second-hand | | all else | average | Since the model is fundamentally a LLM, it might generate texts that are not in the defined set of values `['a', 'question', 're', 'second']`. In those cases, default to `average`, as indicated by the "all else" in the table above. ## Training and evaluation data https://wandb.ai/enigmalabs/witness_reliability_ft_mistral_instruct_v0.1/runs/0skl7iac?nw=nwuserisaaclee ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 2 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - lr_scheduler_warmup_ratio: 0.03 - num_epochs: 3 ### Training results https://wandb.ai/enigmalabs/witness_reliability_ft_mistral_instruct_v0.1/runs/2etycpye?nw=nwuserisaaclee #### Accuracy Metrics - Accuracy: 0.958 - Accuracy for label questionable: 1.000 - Accuracy for label second: 0.941 - Accuracy for label reliable: 0.958 - Accuracy for label average: 0.933 Classification Report: |label|precision|recall|f1-score|support| |-----|---------|------|--------|-------| |average|0.97|0.93|0.95|30| |none|0.00|0.00|0.00|0| |questionable|0.97|1.00|0.98|30| |reliable|0.92|0.96|0.94|24| |second|1.00|0.94|0.97|34| |accuracy|||0.96|118| |macro avg|0.77|0.77|0.77 |118| |weighted avg|0.97|0.96|0.96|118| ### Framework versions - PEFT 0.7.2.dev0 - Transformers 4.36.2 - Pytorch 2.1.2+cu121 - Datasets 2.16.1 - Tokenizers 0.15.1