sanchit-gandhi HF staff commited on
Commit
0953d8b
1 Parent(s): 267437e

Allow single quotes "'" and hyphens "-"

Browse files

Remove single quotes `'` (id 6) and hyphens `-` (id 12) from `suppress_tokens`. These tokens should **not** be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258

Check that we're removing the right tokens:
```python
from transformers import WhisperTokenizer

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base")

print(tokenizer.decode(6))
print(tokenizer.decode(12))
```

**Print Output:**
```
'
-
```

Files changed (1) hide show
  1. config.json +0 -2
config.json CHANGED
@@ -50,12 +50,10 @@
50
  "suppress_tokens": [
51
  1,
52
  2,
53
- 6,
54
  7,
55
  8,
56
  9,
57
  10,
58
- 12,
59
  14,
60
  25,
61
  26,
 
50
  "suppress_tokens": [
51
  1,
52
  2,
 
53
  7,
54
  8,
55
  9,
56
  10,
 
57
  14,
58
  25,
59
  26,