princeton-nlp commited on
Commit
418efde
1 Parent(s): 3a7a74a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -35,7 +35,8 @@ Contact: `{tianyug, awettig}@princeton.edu`
35
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/PPSuEMsUWIyrmrOV_88Xf.png)
36
 
37
 
38
- You can find results for more tasks and models in this [spreadsheet](https://docs.google.com/spreadsheets/d/1qGzimBE8F896p1m7_yWHnjyGX7kpEAeyaT1h2iTbNzE/edit?usp=sharing).
 
39
 
40
  Understanding long-context performance is tricky, as there is no consensus on what’s effective long-context evaluation or how well those existing benchmarks reflect real-world use case. In this work, we curate a combination of existing and new tasks across both synthetic and natural datasets to demonstrate the strength of our model.
41
 
 
35
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/PPSuEMsUWIyrmrOV_88Xf.png)
36
 
37
 
38
+ You can find results for more tasks and models in this [spreadsheet](https://docs.google.com/spreadsheets/d/1qGzimBE8F896p1m7_yWHnjyGX7kpEAeyaT1h2iTbNzE/edit?usp=sharing). In this detailed results, we show that our model can retain the original Llama-3's general LM performance (on tasks selected by the [HF Open LLM Leaderboard v1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard)). This is non-trivial in long-context fine-tuning and requires a careful selection of the fine-tuning data mixture and the training configurations.
39
+
40
 
41
  Understanding long-context performance is tricky, as there is no consensus on what’s effective long-context evaluation or how well those existing benchmarks reflect real-world use case. In this work, we curate a combination of existing and new tasks across both synthetic and natural datasets to demonstrate the strength of our model.
42