--- license: other --- # xs_blenderbot_onnx (only 168 mb) onnx quantized version of facebook/blenderbot_small-90M model (350 mb) Faster cpu inference ## INTRO Before usage: • download blender_model.py script from files in this repo • pip install onnxruntime you can use the model with huggingface generate function with its all parameters # Usage With text generation pipeline ```python >>>from blender_model import TextGenerationPipeline >>>max_answer_length = 100 >>>response_generator_pipe = TextGenerationPipeline(max_length=max_answer_length) >>>utterance = "Hello, how are you?" >>>response_generator_pipe(utterance) i am well. how are you? what do you like to do in your free time? ``` Or you can call the model ```python >>>from blender_model import OnnxBlender >>>from transformers import BlenderbotSmallTokenizer >>>original_repo_id = "facebook/blenderbot_small-90M" >>>repo_id = "remzicam/xs_blenderbot_onnx" >>>model_file_names = [ "blenderbot_small-90M-encoder-quantized.onnx", "blenderbot_small-90M-decoder-quantized.onnx", "blenderbot_small-90M-init-decoder-quantized.onnx", ] >>>model=OnnxBlender(original_repo_id, repo_id, model_file_names) >>>utterance = "Hello, how are you?" >>>inputs = tokenizer(utterance, return_tensors="pt") >>>outputs= model.generate(**inputs, max_length=max_answer_length) >>>response = tokenizer.decode(outputs[0], skip_special_tokens = True) >>>print(response) i am well. how are you? what do you like to do in your free time? ``` # Credits To create the model, I adopted codes from https://github.com/siddharth-sharma7/fast-Bart repository.