Transformers
Safetensors
English
mergekit
Merge
Inference Endpoints
zli12321 commited on
Commit
5bd1a15
1 Parent(s): 84e327f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -1,3 +1,123 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - google/gemma-2-2b-it
4
+ - google/gemma-2-2b-it
5
+ library_name: transformers
6
+ tags:
7
+ - mergekit
8
+ - merge
9
+ license: apache-2.0
10
+ datasets:
11
+ - prometheus-eval/Preference-Collection
12
+ - prometheus-eval/Feedback-Collection
13
+ language:
14
+ - en
15
+ ---
16
+ # prometheus-2-llama-3-8b
17
+
18
+ Finetuned gemma-2-2b-it of [prometheus-7b-v2.0](https://huggingface.co/prometheus-eval/prometheus-7b-v2.0) using [Llama 3.1 8B Instruct](google/gemma-2-2b-it) as the base model.
19
+
20
+ Training hyperparameters:
21
+ * 3 epoch
22
+ * Learning rate 1e-5
23
+ * Effective batch size 4
24
+ * Cosine annealing
25
+ * ~5% warmup
26
+
27
+
28
+ Supports both feedback (likert-scale) evaluation and preference evaluation. Uses Gemma-2-2b-it Instruct the same prompts as prometheus-7b-v2.0. See example information below.
29
+
30
+
31
+ # Feedback Evaluation
32
+ ```
33
+ ABSOLUTE_PROMPT = """###Task Description:
34
+ An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
35
+ 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
36
+ 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
37
+ 3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)"
38
+ 4. Please do not generate any other opening, closing, and explanations.
39
+
40
+ ###The instruction to evaluate:
41
+ {}
42
+
43
+ ###Response to evaluate:
44
+ {}
45
+
46
+ ###Reference Answer (Score 5):
47
+ {}
48
+
49
+ ###Score Rubrics:
50
+ {}
51
+
52
+ ###Feedback: """
53
+
54
+ device = 'cuda:0'
55
+ model = AutoModelForCausalLM.from_pretrained("zli12321/prometheus2-2B").to(device)
56
+ tokenizer = AutoTokenizer.from_pretrained("zli12321/prometheus2-2B")
57
+
58
+ '''
59
+ Define your own instruction, response, reference, and rubric below
60
+ '''
61
+ prompt = ABSOLUTE_PROMPT.format(instruction, response, reference, rubric)
62
+
63
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
64
+ input_length = input_ids.shape[1]
65
+ outputs = model.generate(input_ids, output_logits=True, return_dict_in_generate=True, max_new_tokens=4096)
66
+ print(tokenizer.decode(outputs.sequences[0], skip_special_tokens=True))
67
+
68
+ ```
69
+
70
+ # Preference Evaluation Template
71
+ Follow the above to generate preference evaluation with the preference evaluation template.
72
+
73
+ ```
74
+ ###Task Description:
75
+ An instruction (might include an Input inside it), a response to evaluate, and a score rubric representing a evaluation criteria are given.
76
+ 1. Write a detailed feedback that assess the quality of two responses strictly based on the given score rubric, not evaluating in general.
77
+ 2. After writing a feedback, choose a better response between Response A and Response B. You should refer to the score rubric.
78
+ 3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (A or B)"
79
+ 4. Please do not generate any other opening, closing, and explanations.
80
+
81
+ ###Instruction:
82
+ {}
83
+
84
+ ###Response A:
85
+ {}
86
+
87
+ ###Response B:
88
+ {}
89
+
90
+ ###Reference Answer:
91
+ {}
92
+
93
+ ###Score Rubric:
94
+ {}
95
+
96
+ ###Feedback:
97
+ ```
98
+
99
+
100
+
101
+ # Citations
102
+
103
+
104
+ ```bibtex
105
+ @misc{kim2023prometheus,
106
+ title={Prometheus: Inducing Fine-grained Evaluation Capability in Language Models},
107
+ author={Seungone Kim and Jamin Shin and Yejin Cho and Joel Jang and Shayne Longpre and Hwaran Lee and Sangdoo Yun and Seongjin Shin and Sungdong Kim and James Thorne and Minjoon Seo},
108
+ year={2023},
109
+ eprint={2310.08491},
110
+ archivePrefix={arXiv},
111
+ primaryClass={cs.CL}
112
+ }
113
+ ```
114
+ ```bibtex
115
+ @misc{kim2024prometheus,
116
+ title={Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models},
117
+ author={Seungone Kim and Juyoung Suk and Shayne Longpre and Bill Yuchen Lin and Jamin Shin and Sean Welleck and Graham Neubig and Moontae Lee and Kyungjae Lee and Minjoon Seo},
118
+ year={2024},
119
+ eprint={2405.01535},
120
+ archivePrefix={arXiv},
121
+ primaryClass={cs.CL}
122
+ }
123
+ ```