Jais-13B OpenVINO INT4

This repository contains the inceptionai/jais-13b model...

Jais-13B OpenVINO INT4

This repository contains the inceptionai/jais-13b model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality.

Model Details

Original Model: inceptionai/jais-13b
Model Type: Bilingual (Arabic-English) Large Language Model
Parameters: 13B
OpenVINO Version: 2024.0+
Quantization: INT4 Symmetric AWQ (Activation-aware Weight Quantization)
Group Size: -1 (per-channel quantization)

Jais-13B is a bilingual model that supports both Arabic and English text generation. The model can:

Generate fluent text in both Arabic and English
Respond to prompts in either language
Handle code-switching between the two languages

Optimization Details

This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used:

optimum-cli export openvino \
  -m inceptionai/jais-13b \
  --weight-format int4 \
  --sym \
  --dataset auto \
  --awq \
  --group-size -1 \
  --trust-remote-code \
  jais-13b-int4-sym-ov

Optimization Parameters:

INT4 Quantization: Weights compressed to 4-bit integers
Symmetric Quantization: Using symmetric quantization for better accuracy
AWQ: Activation-aware Weight Quantization to preserve model quality
Auto Dataset: Used automatic dataset sampling for calibration
Group Size: -1 (quantize each output channel independently)
Trust Remote Code: Enabled to support custom model code

Usage

Prerequisites

OpenVINO 2024.0 or newer
optimum-intel
transformers

Sample Inference code with Optimum Intel

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer

# Load tokenizer and model
model_id = "rpanchum/jais-13b-int4-sym-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)

# Generate text
prompt = "Write a short story about a robot learning to paint:"
input_ids = tokenizer(prompt, return_tensors="pt")
output = model.generate(
    **input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Alternative: Using OpenVINO GenAI

Install packages required for using OpenVINO GenAI.

pip install openvino-genai huggingface_hub

Download model and run inference.

import huggingface_hub as hf_hub

model_id = "rpanchum/jais-13b-int4-sym-ov"
model_path = "jais-13b-int4-sym-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

import openvino_genai as ov_genai

device = "CPU"
pipe = ov_genai.LLMPipeline(model_path, device)
print(pipe.generate("ما هو الذكاء الاصطناعي؟", max_length=200))  # "What is AI?" in Arabic
print(pipe.generate("What is artificial intelligence?", max_length=200))

License

This model inherits the license of the original inceptionai/jais-13b model.

rpanchum
/

jais-13b-int4-sym-ov

Jais-13B OpenVINO INT4

Jais-13B OpenVINO INT4

Model Details

Optimization Details

Optimization Parameters:

Usage

Prerequisites

Sample Inference code with Optimum Intel

Alternative: Using OpenVINO GenAI

License

Model tree for rpanchum/jais-13b-int4-sym-ov