cistine commited on
Commit
54a9c47
1 Parent(s): a911e23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -42,8 +42,8 @@ This is the repository for the base 2.7B version finetuned based on [phi-2](http
42
 
43
  | Model Size | Base Model |
44
  | --- | ----------------------------------------------------------------------------- |
45
- | phi-2 | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
46
-
47
 
48
 
49
  ## Model Eval
@@ -55,7 +55,7 @@ It is impratical for us to manually set specific configurations for each fine-tu
55
  Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
56
  To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
57
 
58
- **For fairness, we evaluated the original and fine-tuned CodeLlama models based only on the prompts from the original cases, without including any other instructions.**
59
 
60
  **Besides, we use the greedy decoding method for each model during evaluation.**
61
 
@@ -146,7 +146,8 @@ opencsg-phi-2-v0.1是是一系列基于phi-2的通过全参数微调方法进行
146
 
147
  | 模型大小 | 基座模型 |
148
  | --- | ----------------------------------------------------------------------------- |
149
- | phi-2 | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
 
150
 
151
 
152
  ## 模型评估
@@ -158,7 +159,7 @@ HumanEval 是评估模型在代码生成方面性能的最常见的基准,尤
158
  因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
159
  方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
160
 
161
- **为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的 CodeLlama 模型,不包含任何其他说明。**
162
 
163
  **除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
164
 
 
42
 
43
  | Model Size | Base Model |
44
  | --- | ----------------------------------------------------------------------------- |
45
+ | Opencsg-phi-2-2.7B | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
46
+ | stable-coder-3b-v1-3B |[opencsg/Opencsg-stable-coder-3b-v1](https://huggingface.co/opencsg/opencsg-stable-code-3b-v1)
47
 
48
 
49
  ## Model Eval
 
55
  Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
56
  To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
57
 
58
+ **For fairness, we evaluated the original and fine-tuned phi-2 models based only on the prompts from the original cases, without including any other instructions.**
59
 
60
  **Besides, we use the greedy decoding method for each model during evaluation.**
61
 
 
146
 
147
  | 模型大小 | 基座模型 |
148
  | --- | ----------------------------------------------------------------------------- |
149
+ | Opencsg-phi-2-2.7B | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
150
+ | stable-coder-3b-v1-3B |[opencsg/Opencsg-stable-coder-3b-v1](https://huggingface.co/opencsg/opencsg-stable-code-3b-v1)
151
 
152
 
153
  ## 模型评估
 
159
  因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
160
  方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
161
 
162
+ **为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的 phi-2 模型,不包含任何其他说明。**
163
 
164
  **除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
165