metadata
license: apache-2.0
Model Card for Deita Complexity Scorer
Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).
Deita Complexity Scorer is a tool for automatically annotating the Instruction Complexity of SFT data.
Model description
- Model type: Model fine tuned to automatically annotate the Instruction Complexity
- Language(s) (NLP): Primarily English
- Finetuned from model: Llama-1-13b-hf
Model Sources
- Repository: https://github.com/hkust-nlp/deita
- Model Family: Other models and the dataset are found in the Deita collection.
Usage
Please use the following format
from transformers import AutoTokenizer, AutoModelForCausalLM
import numpy as np
from scipy.special import softmax
model_name = "hkust-nlp/Deita-Complexity-Scorer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
def infer_complexity(model, tokenizer, input_text):
complexity_template = ("You are a helpful assistant. Please identify the complexity score of the following user query. \n##Query: {instruction} \n##Complexity: ")
user_input = complexity_template.format(instruction=input_text)
input_ids = tokenizer.encode(user_input, return_tensors="pt")
max_length = 512
outputs = model.generate(input_ids, max_length=512, num_return_sequences=1, return_dict_in_generate=True, output_scores=True)
logprobs_list = outputs.scores[0][0]
score_logits = []
id2score = {
29896: "1",
29906: "2",
29941: "3",
29946: "4",
29945: "5",
29953: "6"
}
score_template = np.array([1,2,3,4,5,6])
for k in id2score:
score_logits.append(logprobs_list[k])
score_logits = np.array(score_logits)
score_npy = softmax(score_logits, axis=0)
score_npy = score_npy * score_template
score_npy = np.sum(score_npy, axis=0)
return score_npy
# example input
input_text = "write a performance review for a junior data scientist"
complexity_score = infer_complexity(model, tokenizer, input_text)
print(complexity_score)