Spaces:

whyu
/

MM-Vet_Evaluator

Running

whyu commited on Aug 4, 2023

Commit

d873adb

•

1 Parent(s): 9133cc3

initial commit

Files changed (1) hide show

app.py CHANGED Viewed

@@ -16,10 +16,8 @@ openai.api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
 deployment_id = os.environ.get("AZURE_OPENAI_DEP_ID")
 gpt_model = deployment_id
-print(os.environ.get("AZURE_OPENAI_KEY"))
-print(os.environ.get("AZURE_OPENAI_ENDPOINT"))
-print(os.environ.get("AZURE_OPENAI_API_VERSION"))
-print(gpt_model)
 prompt = """Compare the ground truth and prediction from AI models, to give a correctness score for the prediction. <AND> in the ground truth means it is totally right only when all elements in the ground truth are present in the prediction, and <OR> means it is totally right when any one element in the ground truth is present in the prediction. The correctness score is 0.0 (totally wrong), 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0 (totally right). Just complete the last space of the correctness score.
@@ -300,7 +298,7 @@ markdown = """
 In this demo, we offer MM-Vet LLM-based (GPT-4) evaluator to grade open-ended outputs from your models.
-Plese upload your json file of your model results containing `\{v1_0\: ..., v1_1\: ..., \}`.
 The grading may last 5 minutes. Sine we only support 1 queue, the grading time may be longer when you need to wait for other users' grading to finish.

 deployment_id = os.environ.get("AZURE_OPENAI_DEP_ID")
 gpt_model = deployment_id
 prompt = """Compare the ground truth and prediction from AI models, to give a correctness score for the prediction. <AND> in the ground truth means it is totally right only when all elements in the ground truth are present in the prediction, and <OR> means it is totally right when any one element in the ground truth is present in the prediction. The correctness score is 0.0 (totally wrong), 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0 (totally right). Just complete the last space of the correctness score.
 In this demo, we offer MM-Vet LLM-based (GPT-4) evaluator to grade open-ended outputs from your models.
+Plese upload your json file of your model results containing `\{v1_0\: ..., v1_1\: ..., \}`like
 The grading may last 5 minutes. Sine we only support 1 queue, the grading time may be longer when you need to wait for other users' grading to finish.