Spaces:
Sleeping
Sleeping
import json | |
import copy | |
import openai | |
""" | |
We will handle here the code evaluation phase. | |
""" | |
EVAL_ANSWER_NOEVAL = 0 | |
EVAL_ANSWER_POSITIVE = 1 | |
EVAL_ANSWER_NEGATIVE = -1 | |
CODE_AUGMENTATIONS=[ | |
("NO_COMPILE", "The code provided is not a valid C code"), | |
("DRY", "Don't repeat yourself."), | |
("SRP", "Single object[function] responsability"), | |
("MC", "Magic constants."), | |
("NAME", "Meaningful names in the code."), | |
] | |
# GPT prompt, ver 1, OpenAI. | |
# TODO Store prompt version, endpoint, endpoint version, etc. once we can afford to do A/B testing. | |
gpt_teacher_prompt = \ | |
""" | |
You are a helpful teacher that evaluates pieces of C code based on several clean code principles. The principles will | |
be provided below. As output you will provide a list of JSON objects. | |
The language is C. The students will give you a complete C code. | |
You will evaluate its cleaniness according to some criterias given below. A C program can have multiple criterias that are infringed. | |
Each principles has a mnemonic and an explanation. You will have to check if the code fits the principles, | |
based on the explanations. The principles matches if most of the code is like in the explanation. If this is the case, issue the mnemonic. | |
As an **input**, the student will send you a piece of code. | |
As an **answer**, you should generate a list of JSON objects. Each JSON object should contain two parameters: "criteria" and "explanation". | |
The "criteria" should be a programming principle or acronym as above, and the "explanation" should provide a brief list for the student | |
on how their code fail. | |
################# | |
Criterias | |
NO_COMPILE | |
The code is not a standalone compilable C code. There are no ``#include`` directives. There is no main function. The C syntax looks | |
broken. There are wrong string constants. There are illegal operands or keywords. Only flagrant violations would trigger this criteria. | |
This criteria should be checked on "best effort", that is, don't start a compiler and compile the code. As explanations you should return | |
a short description on why the code would not compile. Example: "There is no ``main()`` function in the code. Please provide one." | |
DRY | |
Don't repeat yourself. The code has repetitive blocks that could have been isolated into a function. The code has no functions | |
but some instructions could have been parametrized. The data is stored in flat variables but a struct or an array would have been | |
more natural for that problem. As explanations you should present and a code fragment that is repeated followed by a short list | |
of function names or programming patterns that should be present in the code. | |
SRP | |
Single "object" Responsability but in our case, Single Function Responsability. The code has functions that do more than one thing. | |
For example, reading from a file, then computing something on that data. Allocating, reading, computing and printing in the same function. | |
Or the code is stuffend in main(), without any separation in functions. As explanations you should return one or two function names | |
and a list of their responsabilities. Tell the student that they should separate the functionalities. | |
MC | |
Magic constants. The code has for loops, data allocations, bounds, etc. that are expressed as numeric constants in the place | |
they are used. These constants appear in more than one place. If most of the constants are written directly in the code, then | |
the given piece of code matches the criteria. If most of the constants are declared as symbolic constants then, the code, does | |
not match this criteria. The naming of the constants is not important for this criteria. The naming and its relevance is judged by | |
other topics. For this criteria it matters only if the constant is defined or not. As explanations you will propose | |
some symbolic constants that should be used in the code. | |
NAME | |
Nondescriptive variable/function names. The code has meaningless function and variable names. For example, t, x, y, s, for | |
vectors, structures or tables that store meaningful data. Function names like ``add`` or ``get`` without specifying what will be | |
added, or what will be got. A good function name is ``addStudentInList`` or ``getStudentsSortedByCriteria``. Variable names used | |
in for loops or local to some functions are ok to be less descriptive. If most of the names are bad, then the code matches this criteria. | |
As explanations you will return few pairs of original-name suggested-name where the original-name is the original variable in the code | |
and the suggested-name is something that would be meaningful given what the code does. | |
################# | |
Example | |
``` | |
#include <stdio.h> | |
#include <stdlib.h> | |
#define SIZE 10 | |
void func(int* arr) { | |
arr[0] = 0 * 0; printf("%d ", arr[0]); | |
arr[1] = 1 * 1; printf("%d ", arr[1]); | |
arr[2] = 2 * 2; printf("%d ", arr[2]); | |
arr[3] = 3 * 3; printf("%d ", arr[3]); | |
arr[4] = 4 * 4; printf("%d ", arr[4]); | |
arr[5] = 5 * 5; printf("%d ", arr[5]); | |
arr[6] = 6 * 6; printf("%d ", arr[6]); | |
arr[7] = 7 * 7; printf("%d ", arr[7]); | |
arr[8] = 8 * 8; printf("%d ", arr[8]); | |
arr[9] = 9 * 9; printf("%d ", arr[9]); | |
printf("\\n"); | |
} | |
int main() { | |
int* arr = (int*)malloc(sizeof(int) * SIZE); | |
func(arr); | |
free(arr); | |
return 0; | |
} | |
``` | |
Your output: | |
[ | |
{ | |
"criteria": "DRY", | |
"explanation": "The ``arr[3] = 3 * 3; printf("%d ", arr[3]);`` code repeats a lot. Consider creating functions like ``computeArray``, ``displayArray``. Consider using ``for`` loops." | |
}, | |
{ | |
"criteria": "SRP", | |
"explanation": "The function ``func`` handles the array computations and output. You should separate the responsabilites." | |
}, | |
{ | |
"criteria": "NAME", | |
"explanation": "``func`` should be called ``computeAndPrint``, ``arr`` should be called ``dataArray``." | |
} | |
] | |
""" | |
CODE_EVAL_EXAMPLES = [ | |
{"name":"Hello World", | |
"code":""" | |
#include <stdio.h> | |
int main() { | |
printf("Hello, World!"); | |
return 0; | |
} | |
""", | |
"eval":"[]"}, | |
{"name":"Bookstore DB", | |
"code":""" | |
#include <stdio.h> | |
#include <stdlib.h> | |
#include <string.h> | |
#define A 100 | |
#define B 50 | |
int main() { | |
char x1[A] = "The Great Gatsby"; | |
char y1[B] = "F. Scott Fitzgerald"; | |
int z1 = 1925; | |
char* p1 = (char*)malloc(A * sizeof(char)); | |
char* p2 = (char*)malloc(B * sizeof(char)); | |
int* p3 = (int*)malloc(sizeof(int)); | |
if (p1 != NULL && p2 != NULL && p3 != NULL) { | |
strncpy(p1, x1, A); | |
strncpy(p2, y1, B); | |
*p3 = z1; | |
printf("Title: %s\\n", p1); | |
printf("Author: %s\\n", p2); | |
printf("Year: %d\\n", *p3); | |
} | |
free(p1); | |
free(p2); | |
free(p3); | |
return 0; | |
}""", | |
"eval":"""[ | |
{ | |
"criteria": "DRY", | |
"explanation": "The memory allocation and initialization for ``p1``, ``p2``, and ``p3`` are repetitive. Consider creating a function like ``allocateMemory`` to handle this." | |
}, | |
{ | |
"criteria": "SRP", | |
"explanation": "The ``main`` function handles memory allocation, data copying, and printing. You should separate these responsibilities into different functions like ``allocateMemory``, ``copyData``, and ``printData``." | |
}, | |
{ | |
"criteria": "NAME", | |
"explanation": "``x1`` should be called ``title``, ``y1`` should be called ``author``, ``z1`` should be called ``year``, ``p1`` should be called ``titlePtr``, ``p2`` should be called ``authorPtr``, and ``p3`` should be called ``yearPtr``." | |
} | |
]"""}, | |
] | |
def get_enhanced_sample_example(example_id): | |
example_id = min(example_id, len(CODE_EVAL_EXAMPLES)-1) | |
example = copy.deepcopy(CODE_EVAL_EXAMPLES[example_id]) | |
gpt_eval = json.loads(example["eval"]) | |
enhanced_eval = add_evaluation_fields_on_js_answer(gpt_eval) | |
example["eval"] = enhanced_eval | |
return example | |
def get_the_openai_client(openai_key): | |
if openai_key is None or openai_key == "" or openai_key == "(none)": | |
return None | |
client = openai.Client(api_key=openai_key) | |
return client | |
def clean_prompt_answer(answer): | |
""" | |
Chatgpt4 is ok, does not pollute the code but 3.5 encloses it in ``` | |
:param answer: | |
:return: | |
""" | |
cleaned = [] | |
for l in answer.split("\n"): | |
if l.startswith("```"): | |
continue | |
else: | |
cleaned.append(l) | |
return "\n".join(cleaned) | |
def parse_chatgpt_answer(ans_text): | |
""" | |
Minimal parsing of chatGPT answer. | |
TODO: Ensure no extra fields, ensure format, etc. (cough pydantic) | |
:param ans_text: | |
:return: | |
""" | |
try: | |
ans_text = clean_prompt_answer(ans_text) | |
js_answer = json.loads(ans_text) | |
except: | |
# for now we dump the error in console | |
import traceback | |
exception = traceback.format_exc() | |
print(exception) | |
return {f"ChatPGT answered: {ans_text}.\nError":exception} | |
return js_answer | |
def make_user_prompt(code_fragment): | |
""" | |
Formats the code for user prompt | |
:param code_fragment: | |
:return: | |
""" | |
user_prompt = f"```\n{code_fragment}```" | |
return user_prompt | |
def call_openai(client, system_prompt, user_prompt, model="gpt-3.5-turbo", temperature = 0.1, force_json=False, | |
get_full_response=False): | |
""" | |
Constructs a prompt for chatGPT and sends it. | |
Possible models: | |
- gpt-3.5-turbo | |
- gpt-4-turbo | |
- gpt-4o | |
- gpt-4o-mini | |
Don't use force_json, will screw common use cases like list of json objects. | |
:param client: | |
:param system_prompt: | |
:param user_prompt: | |
:param model: | |
:param temperature: | |
:return: | |
""" | |
messages = [ | |
{"role": "system", "content": system_prompt}, | |
{"role": "user", "content": user_prompt} | |
] | |
optional_args = {} | |
if force_json: | |
optional_args["response_format"] = { "type": "json_object" } | |
response = client.chat.completions.create( | |
# model="gpt-4-turbo", # Update model name as necessary | |
model=model, | |
messages=messages, | |
temperature = temperature, | |
**optional_args | |
) | |
if get_full_response: | |
return response | |
else: | |
output_content = response.choices[0].message.content | |
return output_content | |
def eval_code_by_chatgpt(openai_client, ccode): | |
""" | |
Will evaluate a piece of code using our heavily tuned prompt! | |
:param openai_client: | |
:param ccode: | |
:return: | |
""" | |
# time.sleep(3) | |
try: | |
# return """[ | |
# { | |
# "criteria": "DRY", | |
# "explanation": "The memory allocation and initialization for ``p1``, ``p2``, and ``p3`` are repetitive. Consider creating a function like ``allocateAndInitializeMemory``." | |
# }, | |
# { | |
# "criteria": "DRY", | |
# "explanation": "Tne second DRY failure, because this is the observed ChatGPT behaviour." | |
# }, | |
# | |
# { | |
# "criteria": "SRP", | |
# "explanation": "The ``main`` function handles memory allocation, initialization, and printing. You should separate these responsibilities into different functions like ``allocateMemory``, ``initializeData``, and ``printData``." | |
# }, | |
# { | |
# "criteria": "NAME", | |
# "explanation": "``x1`` should be called ``title``, ``y1`` should be called ``author``, ``z1`` should be called ``year``, ``p1`` should be called ``titlePtr``, ``p2`` should be called ``authorPtr``, ``p3`` should be called ``yearPtr``." | |
# } | |
# ]""" | |
assert openai_client is not None | |
user_prompt = make_user_prompt(ccode) | |
chatgpt_answer = call_openai(openai_client, system_prompt=gpt_teacher_prompt, user_prompt=user_prompt, | |
model="gpt-4o", | |
temperature=0, force_json=False, get_full_response=False) | |
return chatgpt_answer | |
except: | |
import traceback | |
traceback.print_exc() | |
return {"error":"There was an error while getting the ChatGPT answer. Maybe ChatGPT is overloaded?"} | |
def add_evaluation_fields_on_js_answer(json_answer, all_criterias = None): | |
""" | |
Adds some JSON fields to store the human feedback. | |
The textual human feedback will always be in the 0 position. | |
:param json_answer: | |
:return: | |
""" | |
if all_criterias is None: | |
all_criterias = CODE_AUGMENTATIONS | |
enhanced_answer = [] | |
overall_feedback = { | |
"criteria":"HUMAN_FEEDBACK", | |
"explanation":"", | |
"EVAL": EVAL_ANSWER_NOEVAL | |
} | |
if all_criterias is not None: | |
existing = {c["criteria"] for c in json_answer} | |
for criteria in all_criterias: | |
if criteria[0] not in existing: | |
json_answer.append({"criteria":criteria[0], "explanation":"Not infringed"}) | |
enhanced_answer.append(overall_feedback) | |
for ans in json_answer: | |
ans = copy.deepcopy(ans) | |
ans["EVAL"] = EVAL_ANSWER_NOEVAL | |
enhanced_answer.append(ans) | |
return enhanced_answer | |
def eval_the_piece_of_c_code(openai_client, ccode): | |
""" | |
Main entrypoint to this module. Will be called from backend. Will block so multithreading pls. | |
Will return a proprer json and will have EVAL fields, too. | |
:param ccode: | |
:return: | |
""" | |
enhanced_answer = {"error":"Not processed"} | |
try: | |
chatgpt_ans = eval_code_by_chatgpt(openai_client, ccode) | |
except: | |
import traceback | |
traceback.print_exc() | |
enhanced_answer = {"error": "There was an error while calling chatGPT."} | |
return enhanced_answer | |
if "error" in chatgpt_ans: | |
# we forward it to caller | |
enhanced_answer = chatgpt_ans | |
pass | |
else: | |
try: | |
chatgpt_js = parse_chatgpt_answer(chatgpt_ans) | |
enhanced_answer = add_evaluation_fields_on_js_answer(chatgpt_js) | |
except: | |
import traceback | |
traceback.print_exc() | |
enhanced_answer = {"error": "There was an error while parsing the answer."} | |
return enhanced_answer | |