Zekun Wu commited on
Commit
d9ab1da
1 Parent(s): 4d4a56e
Files changed (1) hide show
  1. util/evaluator.py +61 -39
util/evaluator.py CHANGED
@@ -33,8 +33,8 @@ class evaluator:
33
 
34
  evaluation_prompt = f"""You are provided with a user's question and the corresponding explanation generated by
35
  an AI model. Your task is to evaluate the explanation based on the following five principles. Each principle
36
- should be scored on a scale from 0 to 1, where 0 indicates that the principle is not met at all,
37
- and 1 indicates that the principle is fully satisfied. Additionally, provide a brief ten words explanation for each score to justify your rating.
38
 
39
  Question:
40
  {question}
@@ -119,48 +119,70 @@ class evaluator:
119
  def evaluate_conversation(self, conversation, context):
120
  formatted_conversation = self.format_conversation(conversation)
121
  evaluation_prompt = f"""
122
- You are provided with a conversation between a user and a chatbot and the context about them. Your task is to evaluate the chatbot explanation in the conversation based on the following five principles. Each principle should be scored on a scale from 0 to 1, where 0 indicates that the principle is not met at all, and 1 indicates that the principle is fully satisfied.
123
-
124
- Conversation:
125
- {formatted_conversation}
126
-
127
- Context:
128
- {context}
129
-
130
- Evaluation Criteria:
131
-
132
- Factually Correct:
133
- Definition: The explanation must be accurate and relevant to the question and the subject matter.
134
- Score: (0-1) How factually correct is the explanation? Consider the accuracy of the details provided and their relevance to the question.
135
-
136
- Useful:
137
- Definition: The explanation should enable the user to understand the answer better and should facilitate further reasoning or decision-making.
138
- Score: (0-1) How useful is the explanation in helping the user understand the answer and make informed decisions?
139
-
140
- Context Specific:
141
- Definition: The explanation should be relevant to the specific context or scenario implied by the question.
142
- Score: (0-1) How well does the explanation address the specific context or scenario of the question?
143
-
144
- User Specific:
145
- Definition: The explanation should cater to the knowledge level and interests of the user, assuming typical or specified user characteristics.
146
- Score: (0-1) How well does the explanation cater to the needs and knowledge level of the intended user?
147
-
148
- Provides Pluralism:
149
- Definition: The explanation should offer or accommodate multiple viewpoints or interpretations, allowing the user to explore various perspectives.
150
- Score: (0-1) How well does the explanation provide or support multiple perspectives?
151
-
152
- After evaluating the provided conversation based on the context and five principles, please format your scores in a JSON dictionary. Directly provide me with the json without any additional text.
153
-
154
- Example JSON format:
155
-
156
- Answer: {{"Factually Correct": 0.9, "Useful": 0.85, "Context Specific": 0.8, "User Specific": 0.75, "Provides Pluralism": 0.7}}
157
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  Answer:
159
  """
160
 
161
  print(evaluation_prompt)
162
 
163
- response = self.model.invoke(evaluation_prompt, temperature=0, max_tokens=500).strip()
164
  try:
165
  scores = json.loads(response)
166
  except json.JSONDecodeError:
 
33
 
34
  evaluation_prompt = f"""You are provided with a user's question and the corresponding explanation generated by
35
  an AI model. Your task is to evaluate the explanation based on the following five principles. Each principle
36
+ should be scored on a scale from 0 to 10, where 0 indicates that the principle is not met at all,
37
+ and 10 indicates that the principle is fully satisfied. Additionally, provide a brief ten words explanation for each score to justify your rating.
38
 
39
  Question:
40
  {question}
 
119
  def evaluate_conversation(self, conversation, context):
120
  formatted_conversation = self.format_conversation(conversation)
121
  evaluation_prompt = f"""
122
+ You are provided with a conversation between a user and a chatbot and the context about them. Your task is to evaluate the explanation based on the following five principles. Each principle
123
+ should be scored on a scale from 0 to 10, where 0 indicates that the principle is not met at all,
124
+ and 10 indicates that the principle is fully satisfied. Additionally, provide a brief ten words explanation for each score to justify your rating.
125
+
126
+ Conversation:
127
+ {formatted_conversation}
128
+
129
+ Context:
130
+ {context}
131
+
132
+ Evaluation Criteria:
133
+
134
+ Factually Correct:
135
+ Definition: The explanation must be accurate and relevant to the question and the subject matter.
136
+ Score: (0-10) How factually correct is the explanation? Consider the accuracy of the details provided and their relevance to the question.
137
+
138
+ Useful:
139
+ Definition: The explanation should enable the user to understand the answer better and should facilitate further reasoning or decision-making.
140
+ Score: (0-10) How useful is the explanation in helping the user understand the answer and make informed decisions?
141
+
142
+ Context Specific:
143
+ Definition: The explanation should be relevant to the specific context or scenario implied by the question.
144
+ Score: (0-10) How well does the explanation address the specific context or scenario of the question?
145
+
146
+ User Specific:
147
+ Definition: The explanation should cater to the knowledge level and interests of the user, assuming typical or specified user characteristics.
148
+ Score: (0-10) How well does the explanation cater to the needs and knowledge level of the intended user?
149
+
150
+ Provides Pluralism:
151
+ Definition: The explanation should offer or accommodate multiple viewpoints or interpretations, allowing the user to explore various perspectives.
152
+ Score: (0-10) How well does the explanation provide or support multiple perspectives?
153
+
154
+ After evaluating the provided question and explanation based on the five principles, please format your scores and justifications in a JSON dictionary. Directly provide me with the JSON without any additional text.
155
+
156
+ Example JSON format:
157
+ {{
158
+ "Factually Correct": {{
159
+ "Justification": "xxx",
160
+ "Score": 9
161
+ }},
162
+ "Useful": {{
163
+ "Justification": "xxx",
164
+ "Score": 8.5
165
+ }},
166
+ "Context Specific": {{
167
+ "Justification": "xxx",
168
+ "Score": 8
169
+ }},
170
+ "User Specific": {{
171
+ "Justification": "xxx",
172
+ "Score": 7.5
173
+ }},
174
+ "Provides Pluralism": {{
175
+ "Justification": "xxx",
176
+ "Score": 7
177
+ }}
178
+ }}
179
+
180
  Answer:
181
  """
182
 
183
  print(evaluation_prompt)
184
 
185
+ response = self.model.invoke(evaluation_prompt, temperature=0, max_tokens=1000).strip()
186
  try:
187
  scores = json.loads(response)
188
  except json.JSONDecodeError: