princeton-nlp
commited on
Commit
•
418efde
1
Parent(s):
3a7a74a
Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,8 @@ Contact: `{tianyug, awettig}@princeton.edu`
|
|
35 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/PPSuEMsUWIyrmrOV_88Xf.png)
|
36 |
|
37 |
|
38 |
-
You can find results for more tasks and models in this [spreadsheet](https://docs.google.com/spreadsheets/d/1qGzimBE8F896p1m7_yWHnjyGX7kpEAeyaT1h2iTbNzE/edit?usp=sharing).
|
|
|
39 |
|
40 |
Understanding long-context performance is tricky, as there is no consensus on what’s effective long-context evaluation or how well those existing benchmarks reflect real-world use case. In this work, we curate a combination of existing and new tasks across both synthetic and natural datasets to demonstrate the strength of our model.
|
41 |
|
|
|
35 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/PPSuEMsUWIyrmrOV_88Xf.png)
|
36 |
|
37 |
|
38 |
+
You can find results for more tasks and models in this [spreadsheet](https://docs.google.com/spreadsheets/d/1qGzimBE8F896p1m7_yWHnjyGX7kpEAeyaT1h2iTbNzE/edit?usp=sharing). In this detailed results, we show that our model can retain the original Llama-3's general LM performance (on tasks selected by the [HF Open LLM Leaderboard v1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard)). This is non-trivial in long-context fine-tuning and requires a careful selection of the fine-tuning data mixture and the training configurations.
|
39 |
+
|
40 |
|
41 |
Understanding long-context performance is tricky, as there is no consensus on what’s effective long-context evaluation or how well those existing benchmarks reflect real-world use case. In this work, we curate a combination of existing and new tasks across both synthetic and natural datasets to demonstrate the strength of our model.
|
42 |
|