namespace-Pt
/

activation-beacon-mistral-7b

@@ -1,3 +1,8 @@
 <div align="center">
 <h1>Activation Beacon for Mistral</h1>
@@ -34,7 +39,7 @@ We evaluate the model on LongBench using 32K context length.
 |:-:|:-:|:-:|:-:|
 |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|32.70|25.87|27.42|
 |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|33.71|36.08|23.47|
-|Activation-Beacon-Mistral|39.14|43.27|29.52|
 ## [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf)
 We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), we use 32K context length.
@@ -43,7 +48,7 @@ We evaluate the model on InfiniteBench using 128K context length. The results of
 |:-:|:-:|:-:|
 |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|13.14||
 |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|9.55|9.09|
-|Activation-Beacon-Mistral|26.81|12.49|
 ## [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/)
 We evaluate the model on Topic Retrieval task with `[5,10,20,30,40,50,60,70]` topics.
@@ -52,13 +57,13 @@ We evaluate the model on Topic Retrieval task with `[5,10,20,30,40,50,60,70]` to
 ## [PG19 Perplexity](https://arxiv.org/abs/2309.12307)
-We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable flash-attention-2 and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
 |Model|Perplexity|Latency (s)|Memory (GB)|
 |:-:|:-:|:-:|:-:|
 |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
-|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)||||
-|Activation-Beacon-Mistral|8.16|3.06|27.4|
 ## [Passkey Retrieval](https://arxiv.org/abs/2309.12307)

+---
+license: mit
+pipeline_tag: text-generation
+---
 <div align="center">
 <h1>Activation Beacon for Mistral</h1>
 |:-:|:-:|:-:|:-:|
 |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|32.70|25.87|27.42|
 |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|33.71|36.08|23.47|
+|Activation-Beacon-Mistral-7B|39.14|43.27|29.52|
 ## [InfiniteBench](https://arxiv.org/pdf/2402.13718.pdf)
 We evaluate the model on InfiniteBench using 128K context length. The results of Yarn-Mistral-128K is copied from the [paper](https://arxiv.org/pdf/2402.13718.pdf). For [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), we use 32K context length.
 |:-:|:-:|:-:|
 |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|13.14||
 |[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|9.55|9.09|
+|Activation-Beacon-Mistral-7B|26.81|12.49|
 ## [Topic Retrieval](https://lmsys.org/blog/2023-06-29-longchat/)
 We evaluate the model on Topic Retrieval task with `[5,10,20,30,40,50,60,70]` topics.
 ## [PG19 Perplexity](https://arxiv.org/abs/2309.12307)
+We evaluate the sliding window perplexity on PG19 test set with window size 100K and stride 32K. We also report the latency and the GPU memory usage. For full-attention models, we enable [flash-attention-2](https://github.com/Dao-AILab/flash-attention) and [tensor parallel](https://github.com/BlackSamorez/tensor_parallel). The evaluation is run on 8xA800 machine.
 |Model|Perplexity|Latency (s)|Memory (GB)|
 |:-:|:-:|:-:|:-:|
 |[Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|8.83|14.02|525.6 (cannot run on a single GPU)|
+|[Yarn-Mistral-128K](https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k)|7.66|14.56|525.6 (cannot run on a single GPU)|
+|Activation-Beacon-Mistral-7B|8.16|3.06|27.4|
 ## [Passkey Retrieval](https://arxiv.org/abs/2309.12307)

modeling_utils.py CHANGED Viewed

@@ -70,10 +70,6 @@ def evaluate_perplexity(model, dataloader, accelerator:Optional[Accelerator]=Non
         # if the dataloader has been prepared, we shall not prepare it twice, especially in case of deepspeed
         dataloader = accelerator.prepare(dataloader)
-    # if accelerator.process_index == 0:
-    #     for name, x in model.named_parameters():
-    #         print(f"{name: ^80} {x.dtype}")
     all_loss = defaultdict(list)
     for i, x in enumerate(tqdm(dataloader, desc="Computing Perplexity")):
         # NOTE: important to reset memory for every batch

         # if the dataloader has been prepared, we shall not prepare it twice, especially in case of deepspeed
         dataloader = accelerator.prepare(dataloader)
     all_loss = defaultdict(list)
     for i, x in enumerate(tqdm(dataloader, desc="Computing Perplexity")):
         # NOTE: important to reset memory for every batch