syzymon
/

long_llama_code_7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

syzymon commited on Sep 24, 2023

Commit

0f3a950

•

1 Parent(s): 3dea16c

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -106,7 +106,7 @@ This repository contains the research preview of **LongLLaMA, a large language m
 LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
-LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning.**.
 <p align="center" width="100%">
 <img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
@@ -129,8 +129,8 @@ with three layers used for context extension. **Crucially, LongLLaMA is able to
 |----------------|----------|----------|-----------|
 | Source model         | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm)      | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)       |
 | Source model tokens     | 1T      |  1 T |  2T + 0.5 T       |
-| Fine-tuning context | 8K      | 32K | 32K |
-| Fine-tuning tokens  | 10B     | 5B | 35B     |
 | Memory layers         |  6, 12, 18        |   6, 12, 18        |  8, 16, 24        |
 </div>

 LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
+LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning**.
 <p align="center" width="100%">
 <img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
 |----------------|----------|----------|-----------|
 | Source model         | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm)      | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)       |
 | Source model tokens     | 1T      |  1 T |  2T + 0.5 T       |
+| Fine-tuning context | 8K      | **32K** | **32K** |
+| Fine-tuning tokens  | 10B     | 5B | **35B**     |
 | Memory layers         |  6, 12, 18        |   6, 12, 18        |  8, 16, 24        |
 </div>