lightblue
/

DeepSeek-R1-Distill-Qwen-7B-Japanese

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ptrdvn commited on 26 days ago

Commit

56633c4

·

verified ·

1 Parent(s): 29d69ec

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -96,7 +96,7 @@ for output in outputs:
 # Evaluation
-We evaluated this model for output accuracy and the percentage of valid Japanese `<think>` sections using the first 50 rows of the (SakanaAI/gsm8k-ja-test_250-1319)[https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319] dataset.
 We compare this to the original R1 model and test in both regimes where repetition penalty is 1.0 and 1.1:
@@ -110,7 +110,7 @@ We compare this to the original R1 model and test in both regimes where repetiti
 Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
-We further use the first 50 prompts from (DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja] to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
 This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
 |                                                | Repetition Penalty | Valid Japanese `<think>` (%) |

 # Evaluation
+We evaluated this model for output accuracy and the percentage of valid Japanese `<think>` sections using the first 50 rows of the [SakanaAI/gsm8k-ja-test_250-1319](https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319) dataset.
 We compare this to the original R1 model and test in both regimes where repetition penalty is 1.0 and 1.1:
 Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
+We further use the first 50 prompts from [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja) to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
 This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
 |                                                | Repetition Penalty | Valid Japanese `<think>` (%) |