Jais-13B OpenVINO INT4

This repository contains the inceptionai/jais-13b model...

Jais-13B OpenVINO INT4

This repository contains the inceptionai/jais-13b model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality.

Model Details

  • Original Model: inceptionai/jais-13b
  • Model Type: Bilingual (Arabic-English) Large Language Model
  • Parameters: 13B
  • OpenVINO Version: 2024.0+
  • Quantization: INT4 Symmetric AWQ (Activation-aware Weight Quantization)
  • Group Size: -1 (per-channel quantization)

Jais-13B is a bilingual model that supports both Arabic and English text generation. The model can:

  • Generate fluent text in both Arabic and English
  • Respond to prompts in either language
  • Handle code-switching between the two languages

Optimization Details

This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used:

optimum-cli export openvino \
  -m inceptionai/jais-13b \
  --weight-format int4 \
  --sym \
  --dataset auto \
  --awq \
  --group-size -1 \
  --trust-remote-code \
  jais-13b-int4-sym-ov

Optimization Parameters:

  • INT4 Quantization: Weights compressed to 4-bit integers
  • Symmetric Quantization: Using symmetric quantization for better accuracy
  • AWQ: Activation-aware Weight Quantization to preserve model quality
  • Auto Dataset: Used automatic dataset sampling for calibration
  • Group Size: -1 (quantize each output channel independently)
  • Trust Remote Code: Enabled to support custom model code

Usage

Prerequisites

  • OpenVINO 2024.0 or newer
  • optimum-intel
  • transformers

Sample Inference code with Optimum Intel

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer

# Load tokenizer and model
model_id = "rpanchum/jais-13b-int4-sym-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)

# Generate text
prompt = "Write a short story about a robot learning to paint:"
input_ids = tokenizer(prompt, return_tensors="pt")
output = model.generate(
    **input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Alternative: Using OpenVINO GenAI

  1. Install packages required for using OpenVINO GenAI.
pip install openvino-genai huggingface_hub
  1. Download model and run inference.
import huggingface_hub as hf_hub

model_id = "rpanchum/jais-13b-int4-sym-ov"
model_path = "jais-13b-int4-sym-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

import openvino_genai as ov_genai

device = "CPU"
pipe = ov_genai.LLMPipeline(model_path, device)
print(pipe.generate("ما هو الذكاء الاصطناعي؟", max_length=200))  # "What is AI?" in Arabic
print(pipe.generate("What is artificial intelligence?", max_length=200))

License

This model inherits the license of the original inceptionai/jais-13b model.

Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for openvino library.

Model tree for rpanchum/jais-13b-int4-sym-ov

Finetuned
(2)
this model