--- base_model: GeneZC/MiniChat-3B inference: false model_type: llama prompt_template: | [|User|]\n {prompt} [|Assistant|]\n quantized_by: mwitiderrick tags: - deepsparse --- # MiniChat-3B - DeepSparse This repo contains model files for [MiniChat-3B ](https://huggingface.co/GeneZC/MiniChat-3B) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models. This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml). ## Inference Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs: ```bash pip install deepsparse-nightly[llm] ``` Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md): ```python from deepsparse import TextGeneration prompt = "How to get in a good university?" formatted_prompt = f" [|User|]\n{prompt}[|Assistant|]\n" model = TextGeneration(model_path="hf:nm-testing/MiniChat-3B-pruned50-quant-ds") print(model(formatted_prompt, max_new_tokens=500).generations[0].text) """ To get into a good university, you should focus on your academic performance and strive to achieve high grades. This can be done by setting realistic goals and targets, regularly reviewing your progress, and seeking help from teachers or tutors if needed. Additionally, participating in extracurricular activities and building a network of friends can also help in getting into a good university. """ ``` ```python from deepsparse import TextGeneration prompt = "How to become a great software engineer?" formatted_prompt = f" [|User|]\n{prompt}[|Assistant|]\n" model = TextGeneration(model="hf:nm-testing/MiniChat-3B-pruned50-quant-ds") print(model(formatted_prompt, max_new_tokens=500).generations[0].text) """ To become a great software engineer, you need to have a strong foundation in computer science and programming. Here are some tips to help you become a great software engineer: 1. Learn a programming language: You need to learn at least one programming language that you can use to develop software applications. Some popular programming languages include Python, Java, and C++. 2. Learn about data structures and algorithms: You need to learn about data structures and algorithms that you can use to develop software applications. You can learn about data structures like arrays, linked lists, and trees, and algorithms like sorting algorithms and dynamic programming. 3. Practice your skills: You need to practice your skills in programming and data structures to become proficient in your chosen programming language. You can practice by working on open-source projects or contributing to open-source projects. 4. Keep up to date: You need to keep up to date with new technologies and programming languages to stay relevant in the field. You can keep up to date by reading blogs, attending meetups, and participating in online communities. 5. Collaborate with others: You can collaborate with others to develop software applications that can benefit society. You can collaborate with others by participating in open-source projects, contributing to open-source communities, and sharing knowledge with others. By following these tips, you can become a great software engineer and develop software applications that can benefit society. """ ``` ## Prompt template ``` [|User|]\n {prompt} [|Assistant|]\n ``` ## Sparsification For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below. ```bash git clone https://github.com/neuralmagic/sparseml pip install -e "sparseml[transformers]" python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py GeneZC/MiniChat-3B open_platypus --recipe recipe.yaml --save True python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment cp deployment/model.onnx deployment/model-orig.onnx ``` Run this kv-cache injection to speed up the model at inference by caching the Key and Value states: ```python import os import onnx from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector input_file = "deployment/model-orig.onnx" output_file = "deployment/model.onnx" model = onnx.load(input_file, load_external_data=False) model = KeyValueCacheInjector(model_path=os.path.dirname(input_file)).apply(model) onnx.save(model, output_file) print(f"Modified model saved to: {output_file}") ``` Follow the instructions on our [One Shot With SparseML](https://github.com/neuralmagic/sparseml/tree/main/src/sparseml/transformers/sparsification/obcq) page for a step-by-step guide for performing one-shot quantization of large language models. ## Slack For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)