Best way to parallelize the inference of ESG-BERT?

#5
by BMU - opened

I am trying to speed up the inference. Because of the limit in the number of tokens of the input text, I split the input text before hand into chunks. I then have a for loop to classify each chunk:

result_list = list()
for text in text_list:
results = pipeline(text)
result_list.append(results)

I tried with the solution by tyrex in https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop. But I get this warning:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

I did some research about parallelization of inference of transformers models. There is parallelformers
(https://github.com/tunib-ai/parallelformers), but ESG-BERT is not in the current list of supported models. I also found this article on how to parallelise the inference of transformers models on CPU: https://towardsdatascience.com/parallel-inference-of-huggingface-transformers-on-cpus-4487c28abe23.

Any advice on the best way to speed up the inference of ESG-BERT?

I wouldn't recommend parallelizing.

Hugging Face has another library called optimum that can help you speed up your model.

See here for an article explaining how to use it: https://www.philschmid.de/optimizing-transformers-with-optimum

nbroad changed discussion status to closed

Sign up or log in to comment