Spaces:
Sleeping
Sleeping
| import json | |
| import copy | |
| import openai | |
| """ | |
| We will handle here the code evaluation phase. | |
| """ | |
| EVAL_ANSWER_NOEVAL = 0 | |
| EVAL_ANSWER_POSITIVE = 1 | |
| EVAL_ANSWER_NEGATIVE = -1 | |
| CODE_AUGMENTATIONS=[ | |
| ("NO_COMPILE", "The code provided is not a valid C code"), | |
| ("DRY", "Don't repeat yourself."), | |
| ("SRP", "Single object[function] responsability"), | |
| ("MC", "Magic constants."), | |
| ("NAME", "Meaningful names in the code."), | |
| ] | |
| # GPT prompt, ver 1, OpenAI. | |
| # TODO Store prompt version, endpoint, endpoint version, etc. once we can afford to do A/B testing. | |
| gpt_teacher_prompt = \ | |
| """ | |
| You are a helpful teacher that evaluates pieces of C code based on several clean code principles. The principles will | |
| be provided below. As output you will provide a list of JSON objects. | |
| The language is C. The students will give you a complete C code. | |
| You will evaluate its cleaniness according to some criterias given below. A C program can have multiple criterias that are infringed. | |
| Each principles has a mnemonic and an explanation. You will have to check if the code fits the principles, | |
| based on the explanations. The principles matches if most of the code is like in the explanation. If this is the case, issue the mnemonic. | |
| As an **input**, the student will send you a piece of code. | |
| As an **answer**, you should generate a list of JSON objects. Each JSON object should contain two parameters: "criteria" and "explanation". | |
| The "criteria" should be a programming principle or acronym as above, and the "explanation" should provide a brief list for the student | |
| on how their code fail. | |
| ################# | |
| Criterias | |
| NO_COMPILE | |
| The code is not a standalone compilable C code. There are no ``#include`` directives. There is no main function. The C syntax looks | |
| broken. There are wrong string constants. There are illegal operands or keywords. Only flagrant violations would trigger this criteria. | |
| This criteria should be checked on "best effort", that is, don't start a compiler and compile the code. As explanations you should return | |
| a short description on why the code would not compile. Example: "There is no ``main()`` function in the code. Please provide one." | |
| DRY | |
| Don't repeat yourself. The code has repetitive blocks that could have been isolated into a function. The code has no functions | |
| but some instructions could have been parametrized. The data is stored in flat variables but a struct or an array would have been | |
| more natural for that problem. As explanations you should present and a code fragment that is repeated followed by a short list | |
| of function names or programming patterns that should be present in the code. | |
| SRP | |
| Single "object" Responsability but in our case, Single Function Responsability. The code has functions that do more than one thing. | |
| For example, reading from a file, then computing something on that data. Allocating, reading, computing and printing in the same function. | |
| Or the code is stuffend in main(), without any separation in functions. As explanations you should return one or two function names | |
| and a list of their responsabilities. Tell the student that they should separate the functionalities. | |
| MC | |
| Magic constants. The code has for loops, data allocations, bounds, etc. that are expressed as numeric constants in the place | |
| they are used. These constants appear in more than one place. If most of the constants are written directly in the code, then | |
| the given piece of code matches the criteria. If most of the constants are declared as symbolic constants then, the code, does | |
| not match this criteria. The naming of the constants is not important for this criteria. The naming and its relevance is judged by | |
| other topics. For this criteria it matters only if the constant is defined or not. As explanations you will propose | |
| some symbolic constants that should be used in the code. | |
| NAME | |
| Nondescriptive variable/function names. The code has meaningless function and variable names. For example, t, x, y, s, for | |
| vectors, structures or tables that store meaningful data. Function names like ``add`` or ``get`` without specifying what will be | |
| added, or what will be got. A good function name is ``addStudentInList`` or ``getStudentsSortedByCriteria``. Variable names used | |
| in for loops or local to some functions are ok to be less descriptive. If most of the names are bad, then the code matches this criteria. | |
| As explanations you will return few pairs of original-name suggested-name where the original-name is the original variable in the code | |
| and the suggested-name is something that would be meaningful given what the code does. | |
| ################# | |
| Example | |
| ``` | |
| #include <stdio.h> | |
| #include <stdlib.h> | |
| #define SIZE 10 | |
| void func(int* arr) { | |
| arr[0] = 0 * 0; printf("%d ", arr[0]); | |
| arr[1] = 1 * 1; printf("%d ", arr[1]); | |
| arr[2] = 2 * 2; printf("%d ", arr[2]); | |
| arr[3] = 3 * 3; printf("%d ", arr[3]); | |
| arr[4] = 4 * 4; printf("%d ", arr[4]); | |
| arr[5] = 5 * 5; printf("%d ", arr[5]); | |
| arr[6] = 6 * 6; printf("%d ", arr[6]); | |
| arr[7] = 7 * 7; printf("%d ", arr[7]); | |
| arr[8] = 8 * 8; printf("%d ", arr[8]); | |
| arr[9] = 9 * 9; printf("%d ", arr[9]); | |
| printf("\\n"); | |
| } | |
| int main() { | |
| int* arr = (int*)malloc(sizeof(int) * SIZE); | |
| func(arr); | |
| free(arr); | |
| return 0; | |
| } | |
| ``` | |
| Your output: | |
| [ | |
| { | |
| "criteria": "DRY", | |
| "explanation": "The ``arr[3] = 3 * 3; printf("%d ", arr[3]);`` code repeats a lot. Consider creating functions like ``computeArray``, ``displayArray``. Consider using ``for`` loops." | |
| }, | |
| { | |
| "criteria": "SRP", | |
| "explanation": "The function ``func`` handles the array computations and output. You should separate the responsabilites." | |
| }, | |
| { | |
| "criteria": "NAME", | |
| "explanation": "``func`` should be called ``computeAndPrint``, ``arr`` should be called ``dataArray``." | |
| } | |
| ] | |
| """ | |
| CODE_EVAL_EXAMPLES = [ | |
| {"name":"Hello World", | |
| "code":""" | |
| #include <stdio.h> | |
| int main() { | |
| printf("Hello, World!"); | |
| return 0; | |
| } | |
| """, | |
| "eval":"[]"}, | |
| {"name":"Bookstore DB", | |
| "code":""" | |
| #include <stdio.h> | |
| #include <stdlib.h> | |
| #include <string.h> | |
| #define A 100 | |
| #define B 50 | |
| int main() { | |
| char x1[A] = "The Great Gatsby"; | |
| char y1[B] = "F. Scott Fitzgerald"; | |
| int z1 = 1925; | |
| char* p1 = (char*)malloc(A * sizeof(char)); | |
| char* p2 = (char*)malloc(B * sizeof(char)); | |
| int* p3 = (int*)malloc(sizeof(int)); | |
| if (p1 != NULL && p2 != NULL && p3 != NULL) { | |
| strncpy(p1, x1, A); | |
| strncpy(p2, y1, B); | |
| *p3 = z1; | |
| printf("Title: %s\\n", p1); | |
| printf("Author: %s\\n", p2); | |
| printf("Year: %d\\n", *p3); | |
| } | |
| free(p1); | |
| free(p2); | |
| free(p3); | |
| return 0; | |
| }""", | |
| "eval":"""[ | |
| { | |
| "criteria": "DRY", | |
| "explanation": "The memory allocation and initialization for ``p1``, ``p2``, and ``p3`` are repetitive. Consider creating a function like ``allocateMemory`` to handle this." | |
| }, | |
| { | |
| "criteria": "SRP", | |
| "explanation": "The ``main`` function handles memory allocation, data copying, and printing. You should separate these responsibilities into different functions like ``allocateMemory``, ``copyData``, and ``printData``." | |
| }, | |
| { | |
| "criteria": "NAME", | |
| "explanation": "``x1`` should be called ``title``, ``y1`` should be called ``author``, ``z1`` should be called ``year``, ``p1`` should be called ``titlePtr``, ``p2`` should be called ``authorPtr``, and ``p3`` should be called ``yearPtr``." | |
| } | |
| ]"""}, | |
| ] | |
| def get_enhanced_sample_example(example_id): | |
| example_id = min(example_id, len(CODE_EVAL_EXAMPLES)-1) | |
| example = copy.deepcopy(CODE_EVAL_EXAMPLES[example_id]) | |
| gpt_eval = json.loads(example["eval"]) | |
| enhanced_eval = add_evaluation_fields_on_js_answer(gpt_eval) | |
| example["eval"] = enhanced_eval | |
| return example | |
| def get_the_openai_client(openai_key): | |
| if openai_key is None or openai_key == "" or openai_key == "(none)": | |
| return None | |
| client = openai.Client(api_key=openai_key) | |
| return client | |
| def clean_prompt_answer(answer): | |
| """ | |
| Chatgpt4 is ok, does not pollute the code but 3.5 encloses it in ``` | |
| :param answer: | |
| :return: | |
| """ | |
| cleaned = [] | |
| for l in answer.split("\n"): | |
| if l.startswith("```"): | |
| continue | |
| else: | |
| cleaned.append(l) | |
| return "\n".join(cleaned) | |
| def parse_chatgpt_answer(ans_text): | |
| """ | |
| Minimal parsing of chatGPT answer. | |
| TODO: Ensure no extra fields, ensure format, etc. (cough pydantic) | |
| :param ans_text: | |
| :return: | |
| """ | |
| try: | |
| ans_text = clean_prompt_answer(ans_text) | |
| js_answer = json.loads(ans_text) | |
| except: | |
| # for now we dump the error in console | |
| import traceback | |
| exception = traceback.format_exc() | |
| print(exception) | |
| return {f"ChatPGT answered: {ans_text}.\nError":exception} | |
| return js_answer | |
| def make_user_prompt(code_fragment): | |
| """ | |
| Formats the code for user prompt | |
| :param code_fragment: | |
| :return: | |
| """ | |
| user_prompt = f"```\n{code_fragment}```" | |
| return user_prompt | |
| def call_openai(client, system_prompt, user_prompt, model="gpt-3.5-turbo", temperature = 0.1, force_json=False, | |
| get_full_response=False): | |
| """ | |
| Constructs a prompt for chatGPT and sends it. | |
| Possible models: | |
| - gpt-3.5-turbo | |
| - gpt-4-turbo | |
| - gpt-4o | |
| - gpt-4o-mini | |
| Don't use force_json, will screw common use cases like list of json objects. | |
| :param client: | |
| :param system_prompt: | |
| :param user_prompt: | |
| :param model: | |
| :param temperature: | |
| :return: | |
| """ | |
| messages = [ | |
| {"role": "system", "content": system_prompt}, | |
| {"role": "user", "content": user_prompt} | |
| ] | |
| optional_args = {} | |
| if force_json: | |
| optional_args["response_format"] = { "type": "json_object" } | |
| response = client.chat.completions.create( | |
| # model="gpt-4-turbo", # Update model name as necessary | |
| model=model, | |
| messages=messages, | |
| temperature = temperature, | |
| **optional_args | |
| ) | |
| if get_full_response: | |
| return response | |
| else: | |
| output_content = response.choices[0].message.content | |
| return output_content | |
| def eval_code_by_chatgpt(openai_client, ccode): | |
| """ | |
| Will evaluate a piece of code using our heavily tuned prompt! | |
| :param openai_client: | |
| :param ccode: | |
| :return: | |
| """ | |
| # time.sleep(3) | |
| try: | |
| # return """[ | |
| # { | |
| # "criteria": "DRY", | |
| # "explanation": "The memory allocation and initialization for ``p1``, ``p2``, and ``p3`` are repetitive. Consider creating a function like ``allocateAndInitializeMemory``." | |
| # }, | |
| # { | |
| # "criteria": "DRY", | |
| # "explanation": "Tne second DRY failure, because this is the observed ChatGPT behaviour." | |
| # }, | |
| # | |
| # { | |
| # "criteria": "SRP", | |
| # "explanation": "The ``main`` function handles memory allocation, initialization, and printing. You should separate these responsibilities into different functions like ``allocateMemory``, ``initializeData``, and ``printData``." | |
| # }, | |
| # { | |
| # "criteria": "NAME", | |
| # "explanation": "``x1`` should be called ``title``, ``y1`` should be called ``author``, ``z1`` should be called ``year``, ``p1`` should be called ``titlePtr``, ``p2`` should be called ``authorPtr``, ``p3`` should be called ``yearPtr``." | |
| # } | |
| # ]""" | |
| assert openai_client is not None | |
| user_prompt = make_user_prompt(ccode) | |
| chatgpt_answer = call_openai(openai_client, system_prompt=gpt_teacher_prompt, user_prompt=user_prompt, | |
| model="gpt-4o", | |
| temperature=0, force_json=False, get_full_response=False) | |
| return chatgpt_answer | |
| except: | |
| import traceback | |
| traceback.print_exc() | |
| return {"error":"There was an error while getting the ChatGPT answer. Maybe ChatGPT is overloaded?"} | |
| def add_evaluation_fields_on_js_answer(json_answer, all_criterias = None): | |
| """ | |
| Adds some JSON fields to store the human feedback. | |
| The textual human feedback will always be in the 0 position. | |
| :param json_answer: | |
| :return: | |
| """ | |
| if all_criterias is None: | |
| all_criterias = CODE_AUGMENTATIONS | |
| enhanced_answer = [] | |
| overall_feedback = { | |
| "criteria":"HUMAN_FEEDBACK", | |
| "explanation":"", | |
| "EVAL": EVAL_ANSWER_NOEVAL | |
| } | |
| if all_criterias is not None: | |
| existing = {c["criteria"] for c in json_answer} | |
| for criteria in all_criterias: | |
| if criteria[0] not in existing: | |
| json_answer.append({"criteria":criteria[0], "explanation":"Not infringed"}) | |
| enhanced_answer.append(overall_feedback) | |
| for ans in json_answer: | |
| ans = copy.deepcopy(ans) | |
| ans["EVAL"] = EVAL_ANSWER_NOEVAL | |
| enhanced_answer.append(ans) | |
| return enhanced_answer | |
| def eval_the_piece_of_c_code(openai_client, ccode): | |
| """ | |
| Main entrypoint to this module. Will be called from backend. Will block so multithreading pls. | |
| Will return a proprer json and will have EVAL fields, too. | |
| :param ccode: | |
| :return: | |
| """ | |
| enhanced_answer = {"error":"Not processed"} | |
| try: | |
| chatgpt_ans = eval_code_by_chatgpt(openai_client, ccode) | |
| except: | |
| import traceback | |
| traceback.print_exc() | |
| enhanced_answer = {"error": "There was an error while calling chatGPT."} | |
| return enhanced_answer | |
| if "error" in chatgpt_ans: | |
| # we forward it to caller | |
| enhanced_answer = chatgpt_ans | |
| pass | |
| else: | |
| try: | |
| chatgpt_js = parse_chatgpt_answer(chatgpt_ans) | |
| enhanced_answer = add_evaluation_fields_on_js_answer(chatgpt_js) | |
| except: | |
| import traceback | |
| traceback.print_exc() | |
| enhanced_answer = {"error": "There was an error while parsing the answer."} | |
| return enhanced_answer | |