Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ license: bigcode-openrail-m
|
|
16 |
</div>
|
17 |
OpenCSG stands for Converged resources, Software refined, and Generative LM. The 'C' represents Converged resources, indicating the integration and full utilization of hybrid resources. The 'S' stands for Software refined, signifying software that is refined by large models. The 'G' represents Generative LM, which denotes widespread, inclusive, and democratized generative large models.
|
18 |
|
19 |
-
The vision of OpenCSG is to empower every industry, every company, and every individual to own their models. We adhere to the principles of openness and open source, making the large model software stack of OpenCSG available to the community. We welcome everyone to use, feedback, and
|
20 |
|
21 |
|
22 |
|
@@ -29,16 +29,16 @@ Based on StarCoder, opencsg-starcoder-v0.1 was fintuned by OpenCSG LLM Research
|
|
29 |
|
30 |
## Model Eval
|
31 |
|
32 |
-
HumanEval is the
|
33 |
-
|
34 |
-
It is impratical for us to manually set specific
|
35 |
|
36 |
-
|
37 |
-
To simplify the comparison, we chosed the Pass@1 metric
|
38 |
|
39 |
-
**For
|
40 |
|
41 |
-
**Otherwise, we use greedy decoding method for each model during
|
42 |
|
43 |
| Model | HumanEval python pass@1 |
|
44 |
| --- |----------------------------------------------------------------------------- |
|
@@ -46,8 +46,8 @@ To simplify the comparison, we chosed the Pass@1 metric on python language, but
|
|
46 |
| opencsg-starcoder-v0.1| **39.02%** |
|
47 |
|
48 |
**TODO**
|
49 |
-
-
|
50 |
-
-
|
51 |
|
52 |
|
53 |
|
|
|
16 |
</div>
|
17 |
OpenCSG stands for Converged resources, Software refined, and Generative LM. The 'C' represents Converged resources, indicating the integration and full utilization of hybrid resources. The 'S' stands for Software refined, signifying software that is refined by large models. The 'G' represents Generative LM, which denotes widespread, inclusive, and democratized generative large models.
|
18 |
|
19 |
+
The vision of OpenCSG is to empower every industry, every company, and every individual to own their models. We adhere to the principles of openness and open source, making the large model software stack of OpenCSG available to the community. We welcome everyone to use, send feedback, and contribute collaboratively.
|
20 |
|
21 |
|
22 |
|
|
|
29 |
|
30 |
## Model Eval
|
31 |
|
32 |
+
HumanEval is the most common code generation benchmark for evaluating model performance, especially on the compeltion of code exercise cases.
|
33 |
+
Model evaluation is, to some extent, a metaphysics. Different models have different sensitivities to decoding methods, parameters and instructions.
|
34 |
+
It is impratical for us to manually set specific configurations for each fine-tuned model, because a real LLM should master general capabilities despite the parameters being manipulated by users.
|
35 |
|
36 |
+
Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
|
37 |
+
To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
|
38 |
|
39 |
+
**For fairness, we evaluated the original and fine-tuned CodeLlama models based only on the prompts from the original cases, without including any other instructions.**
|
40 |
|
41 |
+
**Otherwise, we use the greedy decoding method for each model during evaluation.**
|
42 |
|
43 |
| Model | HumanEval python pass@1 |
|
44 |
| --- |----------------------------------------------------------------------------- |
|
|
|
46 |
| opencsg-starcoder-v0.1| **39.02%** |
|
47 |
|
48 |
**TODO**
|
49 |
+
- We will provide more benchmark scores on fine-tuned models in the future.
|
50 |
+
- We will provide different practical problems to evaluate the performance of fine-tuned models in the field of software engineering.
|
51 |
|
52 |
|
53 |
|