princeton-nlp
/

Llama-3-8B-ProLong-64k-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

princeton-nlp commited on Jul 22

Commit

418efde

•

1 Parent(s): 3a7a74a

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -35,7 +35,8 @@ Contact: `{tianyug, awettig}@princeton.edu`
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/PPSuEMsUWIyrmrOV_88Xf.png)
-You can find results for more tasks and models in this [spreadsheet](https://docs.google.com/spreadsheets/d/1qGzimBE8F896p1m7_yWHnjyGX7kpEAeyaT1h2iTbNzE/edit?usp=sharing).
 Understanding long-context performance is tricky, as there is no consensus on what’s effective long-context evaluation or how well those existing benchmarks reflect real-world use case. In this work, we curate a combination of existing and new tasks across both synthetic and natural datasets to demonstrate the strength of our model.

 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/607f846419a5af0183d7bfb9/PPSuEMsUWIyrmrOV_88Xf.png)
+You can find results for more tasks and models in this [spreadsheet](https://docs.google.com/spreadsheets/d/1qGzimBE8F896p1m7_yWHnjyGX7kpEAeyaT1h2iTbNzE/edit?usp=sharing). In this detailed results, we show that our model can retain the original Llama-3's general LM performance (on tasks selected by the [HF Open LLM Leaderboard v1](https://huggingface.co/spaces/open-llm-leaderboard-old/open_llm_leaderboard)). This is non-trivial in long-context fine-tuning and requires a careful selection of the fine-tuning data mixture and the training configurations.
 Understanding long-context performance is tricky, as there is no consensus on what’s effective long-context evaluation or how well those existing benchmarks reflect real-world use case. In this work, we curate a combination of existing and new tasks across both synthetic and natural datasets to demonstrate the strength of our model.