Instructions to use Eldenary/qwen-Customer-Service-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Eldenary/qwen-Customer-Service-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B") model = PeftModel.from_pretrained(base_model, "Eldenary/qwen-Customer-Service-lora") - Transformers
How to use Eldenary/qwen-Customer-Service-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Eldenary/qwen-Customer-Service-lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Eldenary/qwen-Customer-Service-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Eldenary/qwen-Customer-Service-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Eldenary/qwen-Customer-Service-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eldenary/qwen-Customer-Service-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Eldenary/qwen-Customer-Service-lora
- SGLang
How to use Eldenary/qwen-Customer-Service-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Eldenary/qwen-Customer-Service-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eldenary/qwen-Customer-Service-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Eldenary/qwen-Customer-Service-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eldenary/qwen-Customer-Service-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Eldenary/qwen-Customer-Service-lora with Docker Model Runner:
docker model run hf.co/Eldenary/qwen-Customer-Service-lora
Qwen3-1.7B Egyptian Arabic Customer Service (LoRA)
A LoRA adapter fine-tuned on top of Qwen/Qwen3-1.7B for Egyptian Arabic customer service conversations. The model handles real-world customer inquiries in colloquial Egyptian Arabic — order tracking, delivery issues, product questions, and complaint resolution.
Model Details
Model Description
This adapter was trained on a custom dataset of 257 multi-turn Egyptian Arabic customer service conversations using Low-Rank Adaptation (LoRA). Only the lightweight adapter weights are stored here; the base Qwen3-1.7B weights remain unchanged and are loaded separately at inference time.
- Developed by: Youssef Eldenary
- Model type: Causal Language Model — LoRA adapter (PEFT) over Qwen3-1.7B
- Language(s) (NLP): Arabic — Egyptian dialect (عامية مصرية)
- License: MIT
- Finetuned from model: Qwen/Qwen3-1.7B
Model Sources
Direct Use
This model is intended for Egyptian Arabic customer service chatbots. Load the adapter on top of the Qwen3-1.7B base model and query it directly with customer messages in Egyptian Arabic dialect. Example scenarios:
- Order tracking: "الأوردر لسه موصلش" → the model asks for the order number and reassures the customer
- Delivery issues: "الشحنة اتأخرت" → the model acknowledges and explains next steps
- Returns & refunds: "عايز أرجع المنتج" → the model walks through the return process
- General complaints: "السلام عليكم، عندي مشكلة في الأوردر" → the model opens a support dialogue
Downstream Use
The adapter can be merged into the base model and integrated into a larger customer support pipeline, chatbot backend (e.g., FastAPI + React), or voice assistant targeting Egyptian Arabic-speaking users. It is well-suited as the language generation component in a retrieval-augmented or tool-calling customer service system.
Out-of-Scope Use
- Other Arabic dialects: The model was trained exclusively on Egyptian Arabic and will likely underperform on Levantine, Gulf, Moroccan, or MSA (Modern Standard Arabic) inputs.
- General-purpose assistant: This is not a general assistant. Performance on topics outside customer service (e.g., coding, science, creative writing) will be limited.
- High-stakes decisions: Should not be used for automated decisions involving refunds, account actions, or policy enforcement without human review.
- Medical, legal, or financial advice: Not appropriate for any of these domains.
Bias, Risks, and Limitations
- Dialect bias: Egyptian Arabic only. Other dialects are out of distribution.
- Domain bias: Trained on e-commerce/order-management scenarios. Responses to unrelated queries may be irrelevant or generic.
- Small dataset: 257 conversations is a compact training set. The model may over-fit to certain phrasing patterns or fail on edge cases not seen during training.
- Hallucination: Like all LLMs, the model can produce fluent but incorrect responses. It has no access to live order data and should always be paired with a backend data source for factual order information.
- No content filtering: The adapter does not include a safety classifier. Downstream deployments should add moderation where appropriate.
Recommendations
- Always connect the model to a live order management system rather than relying on it for factual order status.
- Implement a fallback to a human agent for complaints the model expresses uncertainty about.
- Monitor outputs regularly for quality and tone drift, especially after large volumes of user interactions.
- Inform users they are interacting with an AI assistant.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model = "Qwen/Qwen3-1.7B"
adapter = "Eldenary/qwen-Customer-Service-lora"
# Load base model + adapter
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto", torch_dtype=torch.float16)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(adapter)
# Run a customer query in Egyptian Arabic
prompt = "الأوردر لسه موصلش."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
A custom dataset of 257 multi-turn customer service conversations in colloquial Egyptian Arabic, formatted as Qwen3 chat templates:
{
"conversations": [
{ "role": "user", "content": "السلام عليكم، عندي مشكلة في الأوردر." },
{ "role": "assistant", "content": "وعليكم السلام، تحت أمرك. ممكن تقولي رقم الأوردر؟" }
]
}
The dataset covers order tracking, delivery delays, product returns, refund requests, and general customer complaints — all in Egyptian Arabic dialect. No external dataset was used; the data was collected and curated manually.
Training Procedure
Preprocessing
Each conversation was passed through tokenizer.apply_chat_template() to produce the Qwen3 chat format string, then tokenized with truncation and padding to a maximum sequence length of 512 tokens. Labels were set equal to input_ids for standard causal language modeling (next-token prediction over the full sequence).
Training Hyperparameters
- Training regime: fp16 mixed precision
| Parameter | Value |
|---|---|
| Epochs | 5 |
| Per-device batch size | 2 |
| Gradient accumulation steps | 2 (effective batch size = 4) |
| Learning rate | 2e-4 |
| Max sequence length | 512 tokens |
| Optimizer | AdamW (Trainer default) |
| Checkpointing | Every 50 steps, last 2 kept |
LoRA configuration:
| Parameter | Value |
|---|---|
Rank (r) |
8 |
| Alpha | 16 |
| Dropout | 0.05 |
| Bias | none |
| Task type | CAUSAL_LM |
Evaluation
Factors
Evaluation focused on:
- In-domain queries: Order tracking, delivery issues, returns — topics covered in training data
- Edge cases: Ambiguous or multi-step complaints requiring context from earlier turns
Metrics
Formal automated metrics (BLEU, ROUGE, perplexity) were not computed. Evaluation was qualitative, assessing:
- Dialect naturalness — does the response sound like authentic Egyptian Arabic?
- Relevance — does the response address the customer's actual issue?
- Tone — is the response polite, professional, and helpful?
- Context retention — does the model correctly refer to information from earlier turns?
Results
The model produces fluent, natural Egyptian Arabic responses to customer service queries within the training domain. It handles multi-turn context and resolves references to previously mentioned orders or issues. Performance drops noticeably on queries outside the customer service domain or in non-Egyptian Arabic dialects.
Summary
A lightweight LoRA adapter that adds Egyptian Arabic customer service capability to Qwen3-1.7B with minimal compute. Suitable for deployment in chatbot pipelines targeting Egyptian Arabic-speaking customers, with the caveat that it should always be paired with live data sources for order information.
Model Architecture and Objective
- Architecture: Decoder-only Transformer (Qwen3-1.7B)
- Objective: Causal language modeling — next-token prediction over Qwen3 chat-formatted sequences
- Adapter method: LoRA via PEFT — injects trainable low-rank matrices into the attention layers; base model weights are frozen during training
Software
- Python 3.10+
- PyTorch 2.x
- HuggingFace Transformers
- PEFT 0.19.1
- HuggingFace Datasets
- Accelerate
Citation
BibTeX:
@misc{eldenary2025qwencustomerservice,
author = {Eldenary},
title = {Qwen3-1.7B Egyptian Arabic Customer Service LoRA},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Eldenary/qwen-Customer-Service-lora}}
}
APA:
Eldenary. (2025). Qwen3-1.7B Egyptian Arabic Customer Service LoRA [Model]. HuggingFace. https://huggingface.co/Eldenary/qwen-Customer-Service-lora
Glossary
- LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning method that injects small trainable matrices into a frozen pre-trained model, dramatically reducing the number of trainable parameters.
- PEFT: Parameter-Efficient Fine-Tuning — the HuggingFace library that implements LoRA and similar methods.
- Egyptian Arabic (عامية مصرية): The colloquial spoken dialect of Arabic used in Egypt, distinct from Modern Standard Arabic (MSA) and other regional dialects.
- Causal LM: A language model trained to predict the next token given all previous tokens — the standard objective for GPT-style models.
- Chat template: A structured format that wraps conversation turns (user/assistant roles) into a single string the model can process.
More Information
- Base model: Qwen/Qwen3-1.7B
- Training code and dataset: GitHub repository
- PEFT documentation: https://huggingface.co/docs/peft
Model Card Contact
Open an issue on the GitHub repository for questions, feedback, or collaboration.
Framework versions
- PEFT 0.19.1
- Downloads last month
- 34