p1atdev commited on
Commit
bcbb248
1 Parent(s): 57360e1

Do not add EOS token when tokenizine by default

Browse files

This PR reduces the confusing about tokenizer loading.

The current setting requires loading the tokenizer with `add_eos_token=False` or the EOS token will be added automatically, leading to weird completion results.

- Before:

```py
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b", add_eos_token=False)
```

- After:

```py
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b")
```

`"add_eos_token": false` in `tokenizer_config.json` is the same as `sbintuitions/sarashina2-70b`'s.
https://huggingface.co/sbintuitions/sarashina2-70b/blob/main/tokenizer_config.json#L134

Files changed (1) hide show
  1. tokenizer_config.json +1 -1
tokenizer_config.json CHANGED
@@ -131,5 +131,5 @@
131
  "add_dummy_prefix_space": false,
132
  "legacy": false,
133
  "add_bos_token": false,
134
- "add_eos_token": true
135
  }
 
131
  "add_dummy_prefix_space": false,
132
  "legacy": false,
133
  "add_bos_token": false,
134
+ "add_eos_token": false
135
  }