--- license: mit --- # **ChatGLM-4-onnx-cpu-int4** This is the ONNX format INT4 quantized version of the glm-4-9b. 1. Install ```bash pip install torch transformers onnx onnxruntime pip install --pre onnxruntime-genai ``` 2. Sample ```bash import onnxruntime_genai as og import numpy as np import os model_folder = ".\chatglm-onnx\model" model = og.Model(model_folder) tokenizer = og.Tokenizer(model) tokenizer_stream = tokenizer.create_stream() search_options = {} search_options['max_length'] = 2048 search_options['past_present_share_buffer'] = False chat_template = "<|user|>{input}<|assistant|>" text = """介绍一下华南师范大学?""" prompt = f'{chat_template.format(input=text)}' input_tokens = tokenizer.encode(prompt) params = og.GeneratorParams(model) params.set_search_options(**search_options) params.input_ids = input_tokens generator = og.Generator(model, params) while not generator.is_done(): generator.compute_logits() generator.generate_next_token() new_token = generator.get_next_tokens()[0] print(tokenizer_stream.decode(new_token), end='', flush=True) ```