Update README.md
Browse files
README.md
CHANGED
@@ -106,7 +106,7 @@ This repository contains the research preview of **LongLLaMA, a large language m
|
|
106 |
|
107 |
LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
|
108 |
|
109 |
-
LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning
|
110 |
|
111 |
<p align="center" width="100%">
|
112 |
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
@@ -129,8 +129,8 @@ with three layers used for context extension. **Crucially, LongLLaMA is able to
|
|
129 |
|----------------|----------|----------|-----------|
|
130 |
| Source model | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm) | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) |
|
131 |
| Source model tokens | 1T | 1 T | 2T + 0.5 T |
|
132 |
-
| Fine-tuning context | 8K | 32K | 32K |
|
133 |
-
| Fine-tuning tokens | 10B | 5B | 35B |
|
134 |
| Memory layers | 6, 12, 18 | 6, 12, 18 | 8, 16, 24 |
|
135 |
|
136 |
</div>
|
|
|
106 |
|
107 |
LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
|
108 |
|
109 |
+
LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning**.
|
110 |
|
111 |
<p align="center" width="100%">
|
112 |
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
|
|
129 |
|----------------|----------|----------|-----------|
|
130 |
| Source model | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm) | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) |
|
131 |
| Source model tokens | 1T | 1 T | 2T + 0.5 T |
|
132 |
+
| Fine-tuning context | 8K | **32K** | **32K** |
|
133 |
+
| Fine-tuning tokens | 10B | 5B | **35B** |
|
134 |
| Memory layers | 6, 12, 18 | 6, 12, 18 | 8, 16, 24 |
|
135 |
|
136 |
</div>
|