--- license: apache-2.0 --- ## chatglm3-ggml This repo contains GGML format model files for chatglm3-6B. ### Example code #### Install packages ```bash pip install xinference[ggml]>=0.4.3 ``` If you want to run with GPU acceleration, refer to [installation](https://github.com/xorbitsai/inference#installation). #### Start a local instance of Xinference ```bash xinference -p 9997 ``` #### Launch and inference ```python from xinference.client import Client client = Client("http://localhost:9997") model_uid = client.launch_model( model_name="chatglm3", model_format="ggmlv3", model_size_in_billions=6, quantization="q4_0", ) model = client.get_model(model_uid) chat_history = [] prompt = "最大的动物是什么?" model.chat( prompt, chat_history, generate_config={"max_tokens": 1024} ) ``` ### More information [Xinference](https://github.com/xorbitsai/inference) Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you are empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. 👉 Join our Slack community!