metadata
license: mit
TokenMonster
The documentation and code is available on Github alasdairforsythe/tokenmonster.
Trained models can be downloaded from here:
With capcode
Name | Vocab Size | Charset | Availablity |
---|---|---|---|
english-100256-capcode | 100256 | UTF-8 | download |
english-65536-capcode | 65536 | UTF-8 | download |
english-50256-capcode | 50256 | UTF-8 | download |
english-40000-capcode | 40000 | UTF-8 | in-progress |
english-32000-capcode | 32000 | UTF-8 | download |
english-24000-capcode | 24000 | UTF-8 | in-progress |
code-100256-capcode | 100256 | UTF-8 | download |
code-65536-capcode | 65536 | UTF-8 | in-progress |
code-50256-capcode | 50256 | UTF-8 | in-progress |
code-40000-capcode | 40000 | UTF-8 | in-progress |
code-32000-capcode | 32000 | UTF-8 | download |
code-24000-capcode | 24000 | UTF-8 | in-progress |
Without capcode
Name | Vocab Size | Charset | Availablity |
---|---|---|---|
english-100256 | 100256 | UTF-8 | download |
english-65536 | 65536 | UTF-8 | in-progress |
english-50256 | 50256 | UTF-8 | in-progress |
english-40000 | 40000 | UTF-8 | in-progress |
english-32000 | 32000 | UTF-8 | in-progress |
english-24000 | 24000 | UTF-8 | in-progress |
code-100256 | 100256 | UTF-8 | download |
code-65536 | 65536 | UTF-8 | in-progress |
code-50256 | 50256 | UTF-8 | download |
code-40000 | 40000 | UTF-8 | in-progress |
code-32000 | 32000 | UTF-8 | download |
code-24000 | 24000 | UTF-8 | in-progress |
in-progress vocabularies will be released 1 per day.