perplexity-ai
/

r1-1776-distill-llama-70b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

eugenhotaj-ppl commited on 6 days ago

Commit

43dd5f9

·

verified ·

1 Parent(s): 07ce166

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -24,12 +24,14 @@ We also ensured that the model’s math and reasoning abilities remained intact
 Evaluations on multiple benchmarks showed that our post-trained model performed on par with the base R1 model,
 indicating that the decensoring had no impact on its core reasoning capabilities.
-| Benchmark | R1-Distill-Llama3-70B | R1-1776-Distill-Llama3-70B |
 | --- | --- | --- |
 | China Censorship |  80.53 | 0.2 |
-| Internal Benchmarks (avg) | 48.4 | 47.64 |
 | AIME 2024 | 70 | 70 |
 | MATH-500 | 94.5 | 94.8 |
-| MMLU | 88.52 | 88.20 |
-| DROP | 84.55 | 84.83 |
-| GPQA | 65.2 | 65.05 |

 Evaluations on multiple benchmarks showed that our post-trained model performed on par with the base R1 model,
 indicating that the decensoring had no impact on its core reasoning capabilities.
+| Benchmark | R1-Distill-Llama-70B | R1-1776-Distill-Llama-70B |
 | --- | --- | --- |
 | China Censorship |  80.53 | 0.2 |
+| Internal Benchmarks (avg) | 47.64 |  48.4 |
 | AIME 2024 | 70 | 70 |
 | MATH-500 | 94.5 | 94.8 |
+| MMLU | 88.52 * | 88.20 |
+| DROP | 84.55 * | 84.83 |
+| GPQA | 65.2 | 65.05 |
+\* Evaluated by Perplexity AI since they were not reported in the [paper](https://arxiv.org/abs/2501.12948).