Jais-13B OpenVINO INT4
This repository contains the inceptionai/jais-13b model...
Jais-13B OpenVINO INT4
This repository contains the inceptionai/jais-13b model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality.
Model Details
- Original Model: inceptionai/jais-13b
- Model Type: Bilingual (Arabic-English) Large Language Model
- Parameters: 13B
- OpenVINO Version: 2024.0+
- Quantization: INT4 Symmetric AWQ (Activation-aware Weight Quantization)
- Group Size: -1 (per-channel quantization)
Jais-13B is a bilingual model that supports both Arabic and English text generation. The model can:
- Generate fluent text in both Arabic and English
- Respond to prompts in either language
- Handle code-switching between the two languages
Optimization Details
This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used:
optimum-cli export openvino \
-m inceptionai/jais-13b \
--weight-format int4 \
--sym \
--dataset auto \
--awq \
--group-size -1 \
--trust-remote-code \
jais-13b-int4-sym-ov
Optimization Parameters:
- INT4 Quantization: Weights compressed to 4-bit integers
- Symmetric Quantization: Using symmetric quantization for better accuracy
- AWQ: Activation-aware Weight Quantization to preserve model quality
- Auto Dataset: Used automatic dataset sampling for calibration
- Group Size: -1 (quantize each output channel independently)
- Trust Remote Code: Enabled to support custom model code
Usage
Prerequisites
- OpenVINO 2024.0 or newer
- optimum-intel
- transformers
Sample Inference code with Optimum Intel
from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer
# Load tokenizer and model
model_id = "rpanchum/jais-13b-int4-sym-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)
# Generate text
prompt = "Write a short story about a robot learning to paint:"
input_ids = tokenizer(prompt, return_tensors="pt")
output = model.generate(
**input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)
Alternative: Using OpenVINO GenAI
- Install packages required for using OpenVINO GenAI.
pip install openvino-genai huggingface_hub
- Download model and run inference.
import huggingface_hub as hf_hub
model_id = "rpanchum/jais-13b-int4-sym-ov"
model_path = "jais-13b-int4-sym-ov"
hf_hub.snapshot_download(model_id, local_dir=model_path)
import openvino_genai as ov_genai
device = "CPU"
pipe = ov_genai.LLMPipeline(model_path, device)
print(pipe.generate("ما هو الذكاء الاصطناعي؟", max_length=200)) # "What is AI?" in Arabic
print(pipe.generate("What is artificial intelligence?", max_length=200))
License
This model inherits the license of the original inceptionai/jais-13b model.
- Downloads last month
- 15
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-generation models for openvino library.
Model tree for rpanchum/jais-13b-int4-sym-ov
Base model
inceptionai/jais-13b