Do not add EOS token when tokenizine by default
Browse filesThis PR reduces the confusing about tokenizer loading.
The current setting requires loading the tokenizer with `add_eos_token=False` or the EOS token will be added automatically, leading to weird completion results.
- Before:
```py
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b", add_eos_token=False)
```
- After:
```py
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b")
```
`"add_eos_token": false` in `tokenizer_config.json` is the same as `sbintuitions/sarashina2-70b`'s.
https://huggingface.co/sbintuitions/sarashina2-70b/blob/main/tokenizer_config.json#L134
- tokenizer_config.json +1 -1
tokenizer_config.json
CHANGED
@@ -131,5 +131,5 @@
|
|
131 |
"add_dummy_prefix_space": false,
|
132 |
"legacy": false,
|
133 |
"add_bos_token": false,
|
134 |
-
"add_eos_token":
|
135 |
}
|
|
|
131 |
"add_dummy_prefix_space": false,
|
132 |
"legacy": false,
|
133 |
"add_bos_token": false,
|
134 |
+
"add_eos_token": false
|
135 |
}
|