Spaces:

Dovakiins
/

qwerrwe

Build error

App Files Files Community

hamel commited on Dec 12, 2023

Commit

f1de29d

•

1 Parent(s): 7fabc4d

Respect sequence_len in config for `type: llama2_chat` (#926)

Browse files

* Respect sequence_len in config for `type: llama2_chat`

It was hardcoded to `4096` I am not sure why? This updates it to pull from the config.

cc:

@winglian

* Update llama2_chat.py

* apply black formatting

* fix tokenizer

* update test data

* lint fixtures

Files changed (2) hide show

src/axolotl/prompt_strategies/llama2_chat.py +3 -2
tests/fixtures/conversation.tokenized_llama2chat.json +0 -0

src/axolotl/prompt_strategies/llama2_chat.py CHANGED Viewed

@@ -81,8 +81,9 @@ class LLama2ChatTokenizingStrategy(PromptTokenizingStrategy):
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
-        self.sequence_len = 4096
-        self.tokenizer.add_special_tokens({"pad_token": "<pad>"})
         # https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/added_tokens.json
     def tokenize_prompt(self, prompt):

     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
+        self.tokenizer.add_special_tokens(
+            {"pad_token": getattr(self.tokenizer, "pad_token", "<pad>")}
+        )
         # https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/added_tokens.json
     def tokenize_prompt(self, prompt):

tests/fixtures/conversation.tokenized_llama2chat.json CHANGED Viewed

The diff for this file is too large to render. See raw diff