sanchit-gandhi HF staff commited on
Commit
523f7db
1 Parent(s): e5ea493

Allow single quotes "'" and hyphens "-"

Browse files

Remove single quotes `'` (id 6) and hyphens `-` (id 12) from `suppress_tokens`. These tokens should **not** be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258

Check that we're removing the right tokens:
```python
from transformers import WhisperTokenizer

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-medium.en")

print(tokenizer.decode(6))
print(tokenizer.decode(12))
```

**Print Output:**
```
'
-
```

Files changed (1) hide show
  1. config.json +0 -2
config.json CHANGED
@@ -42,12 +42,10 @@
42
  "suppress_tokens": [
43
  1,
44
  2,
45
- 6,
46
  7,
47
  8,
48
  9,
49
  10,
50
- 12,
51
  14,
52
  25,
53
  26,
 
42
  "suppress_tokens": [
43
  1,
44
  2,
 
45
  7,
46
  8,
47
  9,
48
  10,
 
49
  14,
50
  25,
51
  26,