Starling-LM-7B-beta ONNX

Model Summary

This repository contains the ONNX-optimized version of Starling-LM-7B-beta, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.

Optimized Configurations

The following optimized configurations are available:

ONNX model for int4 DirectML: Optimized for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
ONNX model for int4 CPU and Mobile: ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.

Usage

Installation and Setup

To use the Starling-LM-7B-beta ONNX model on Windows with DirectML, follow these steps:

Create and activate a Conda environment:

conda create -n onnx python=3.10
conda activate onnx

Install Git LFS:

winget install -e --id GitHub.GitLFS

Install Hugging Face CLI:

pip install huggingface-hub[cli]

Download the model:

huggingface-cli download EmbeddedLLM/Starling-LM-7b-beta-onnx --include="onnx/directml/*" --local-dir .\Starling-LM-7B-beta-onnx

Install necessary Python packages:

pip install numpy==1.26.4
pip install onnxruntime-directml
pip install --pre onnxruntime-genai-directml

Install Visual Studio 2015 runtime:

conda install conda-forge::vs2015_runtime

Download the example script:

Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"

Run the example script:

python phi3-qa.py -m .\Starling-LM-7B-beta-onnx

Hardware Requirements

Minimum Configuration:

Windows: DirectX 12-capable GPU (AMD/Nvidia)
CPU: x86_64 / ARM64

Tested Configurations:

GPU: AMD Ryzen 8000 Series iGPU (DirectML)
CPU: AMD Ryzen CPU

Model Description

Developed by: The Nexusflow Team (Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao)
Model type: Language Model fine-tuned with RLHF / RLAIF
License: Apache-2.0 license under the condition that the model is not used to compete with OpenAI
Finetuned from model: Openchat-3.5-0106 (based on Mistral-7B-v0.1)

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). Harnessing the power of the ranking dataset, berkeley-nest/Nectar, the upgraded reward model, Starling-RM-34B, and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.

License

The dataset, model and online demo is subject to the Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us if you find any potential violation.

Citation

@misc{starling2023,
title = {Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF},
url = {},
author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Ganesan, Karthik and Chiang, Wei-Lin and Zhang, Jian and Jiao, Jiantao},
month = {November},
year = {2023}
}

EmbeddedLLM
/

Starling-LM-7b-beta-onnx

Starling-LM-7B-beta ONNX

Model Summary

Optimized Configurations

Usage

Installation and Setup

Hardware Requirements

Model Description

License

Citation

Model tree for EmbeddedLLM/Starling-LM-7b-beta-onnx

Dataset used to train EmbeddedLLM/Starling-LM-7b-beta-onnx

Collection including EmbeddedLLM/Starling-LM-7b-beta-onnx

ONNX GenAI