Installationο
π€ Tokenizers is tested on Python 3.5+.
You should install π€ Tokenizers in a virtual environment. If youβre unfamiliar with Python virtual environments, check out the user guide. Create a virtual environment with the version of Python youβre going to use and activate it.
Installation with pipο
π€ Tokenizers can be installed using pip as follows:
pip install tokenizers
Installation from sourcesο
To use this method, you need to have the Rust language installed. You can follow the official guide for more information.
If you are using a unix based OS, the installation should be as simple as running:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Or you can easiy update it with the following command:
rustup update
Once rust is installed, we can start retrieving the sources for π€ Tokenizers:
git clone https://github.com/huggingface/tokenizers
Then we go into the python bindings folder:
cd tokenizers/bindings/python
At this point you should have your virtual environment already activated. In order to
compile π€ Tokenizers, you need to install the Python package setuptools_rust
:
pip install setuptools_rust
Then you can have π€ Tokenizers compiled and installed in your virtual environment with the following command:
python setup.py install