Installation

πŸ€— Tokenizers is tested on Python 3.5+.

You should install πŸ€— Tokenizers in a virtual environment. If you’re unfamiliar with Python virtual environments, check out the user guide. Create a virtual environment with the version of Python you’re going to use and activate it.

Installation with pip

πŸ€— Tokenizers can be installed using pip as follows:

pip install tokenizers

Installation from sources

To use this method, you need to have the Rust language installed. You can follow the official guide for more information.

If you are using a unix based OS, the installation should be as simple as running:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Or you can easiy update it with the following command:

rustup update

Once rust is installed, we can start retrieving the sources for πŸ€— Tokenizers:

git clone https://github.com/huggingface/tokenizers

Then we go into the python bindings folder:

cd tokenizers/bindings/python

At this point you should have your virtual environment already activated. In order to compile πŸ€— Tokenizers, you need to install the Python package setuptools_rust:

pip install setuptools_rust

Then you can have πŸ€— Tokenizers compiled and installed in your virtual environment with the following command:

python setup.py install