Edit model card

Starling-LM-7B-beta ONNX

Model Summary

This repository contains the ONNX-optimized version of Starling-LM-7B-beta, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.

Optimized Configurations

The following optimized configurations are available:

  • ONNX model for int4 DirectML: Optimized for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
  • ONNX model for int4 CPU and Mobile: ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.

Usage

Installation and Setup

To use the Starling-LM-7B-beta ONNX model on Windows with DirectML, follow these steps:

  1. Create and activate a Conda environment:
conda create -n onnx python=3.10
conda activate onnx
  1. Install Git LFS:
winget install -e --id GitHub.GitLFS
  1. Install Hugging Face CLI:
pip install huggingface-hub[cli]
  1. Download the model:
huggingface-cli download EmbeddedLLM/Starling-LM-7b-beta-onnx --include="onnx/directml/*" --local-dir .\Starling-LM-7B-beta-onnx
  1. Install necessary Python packages:
pip install numpy==1.26.4
pip install onnxruntime-directml
pip install --pre onnxruntime-genai-directml
  1. Install Visual Studio 2015 runtime:
conda install conda-forge::vs2015_runtime
  1. Download the example script:
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
  1. Run the example script:
python phi3-qa.py -m .\Starling-LM-7B-beta-onnx

Hardware Requirements

Minimum Configuration:

  • Windows: DirectX 12-capable GPU (AMD/Nvidia)
  • CPU: x86_64 / ARM64

Tested Configurations:

  • GPU: AMD Ryzen 8000 Series iGPU (DirectML)
  • CPU: AMD Ryzen CPU

Model Description

  • Developed by: The Nexusflow Team (Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao)
  • Model type: Language Model fine-tuned with RLHF / RLAIF
  • License: Apache-2.0 license under the condition that the model is not used to compete with OpenAI
  • Finetuned from model: Openchat-3.5-0106 (based on Mistral-7B-v0.1)

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.

License

The dataset, model and online demo is subject to the Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us if you find any potential violation.

Citation

@misc{starling2023,
title = {Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF},
url = {},
author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Ganesan, Karthik and Chiang, Wei-Lin and Zhang, Jian and Jiao, Jiantao},
month = {November},
year = {2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Input a message to start chatting with EmbeddedLLM/Starling-LM-7b-beta-onnx.
Unable to determine this model's library. Check the docs .

Finetuned from

Dataset used to train EmbeddedLLM/Starling-LM-7b-beta-onnx

Collection including EmbeddedLLM/Starling-LM-7b-beta-onnx