Instructions to use quantumsquatan/egypt-llm-finetune with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use quantumsquatan/egypt-llm-finetune with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
model = PeftModel.from_pretrained(base_model, "quantumsquatan/egypt-llm-finetune")

Transformers

How to use quantumsquatan/egypt-llm-finetune with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="quantumsquatan/egypt-llm-finetune")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("quantumsquatan/egypt-llm-finetune", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use quantumsquatan/egypt-llm-finetune with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "quantumsquatan/egypt-llm-finetune"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "quantumsquatan/egypt-llm-finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/quantumsquatan/egypt-llm-finetune

SGLang

How to use quantumsquatan/egypt-llm-finetune with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "quantumsquatan/egypt-llm-finetune" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "quantumsquatan/egypt-llm-finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "quantumsquatan/egypt-llm-finetune" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "quantumsquatan/egypt-llm-finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use quantumsquatan/egypt-llm-finetune with Docker Model Runner:
```
docker model run hf.co/quantumsquatan/egypt-llm-finetune
```

🇪🇬 Egypt LLM Fine-Tune

Model ID: quantumsquatan/egypt-llm-finetune

This is an educational LoRA fine-tune of TinyLlama created while learning from the Hugging Face LLM Course in Egypt. It is designed for simple Computer Science, AI, programming, and student-life explanations in English with Egyptian Arabic context.

Model Details

Model Description

Developed by: quantumsquatan
Shared by: quantumsquatan
Model type: PEFT LoRA adapter for causal language modeling
Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Language(s): English and Arabic/Egyptian Arabic context
License: Apache 2.0
Fine-tuning method: Supervised fine-tuning (SFT) with LoRA
PEFT type: LoRA
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, v_proj
Task type: CAUSAL_LM
Tokenizer max length: 2048
Adapter size: about 4.5 MB
Funding: No external funding reported

Model Sources

Hugging Face model: quantumsquatan/egypt-llm-finetune
GitHub repository: omar-rr/egypt-llm-finetune
Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Paper: No paper published for this educational fine-tune
Demo: No dedicated demo published yet

Uses

Direct Use

This model can be used as a lightweight educational chat model for:

Explaining Computer Science concepts in simple language
Explaining basic AI and machine learning ideas
Helping students understand programming and technical study topics
Casual English/Arabic conversation with Egyptian student context
Demonstrating how LoRA fine-tuning works on a small model

Downstream Use

This adapter can be loaded on top of TinyLlama/TinyLlama-1.1B-Chat-v1.0 in educational demos, notebooks, small experiments, and student projects. It can also be used as a starting point for further fine-tuning on larger, higher-quality Arabic or CS-focused instruction datasets.

Out-of-Scope Use

This model should not be used for:

Legal, medical, financial, safety-critical, or high-stakes decisions
Production systems that require factual guarantees
Replacing teachers, domain experts, lawyers, doctors, or professional advisors
Generating harmful, deceptive, or abusive content
Claims of broad Arabic mastery or expert-level Egyptian dialect understanding

Bias, Risks, and Limitations

This is a first educational fine-tune trained on a small custom dataset of 41 examples. Because the dataset is small, the model can overfit, hallucinate, repeat patterns, or answer outside its training scope. The base model is also small, so reasoning quality, factual accuracy, Arabic fluency, and instruction following may be limited.

Known limitations:

Small training dataset
No published benchmark evaluation
Not a legal, medical, or professional advice model
May produce incorrect or outdated information
May mix English and Arabic unexpectedly
May reflect biases from the base model and the custom examples

Recommendations

Use this model for learning, demos, and experimentation. Verify important information with reliable sources. For better results, evaluate outputs manually, add a larger dataset, and test with held-out prompts before any serious downstream use.

How to Get Started with the Model

This repository contains a PEFT LoRA adapter. Load it with the base model and tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

base_model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_id = "quantumsquatan/egypt-llm-finetune"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, adapter_id)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

messages = [
    {"role": "user", "content": "Explain neural networks simply."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

print(pipe(prompt, max_new_tokens=200, do_sample=True, temperature=0.7)[0]["generated_text"])

Training Details

Training Data

The model was trained on 41 custom examples focused on:

Computer Science explanations
AI and machine learning explanations
Programming learning support
Student life in Egypt
General educational conversation
English with Egyptian Arabic context

The dataset is not published as a separate Hugging Face dataset card in this repository.

Training Procedure

The model was fine-tuned using supervised fine-tuning with PEFT LoRA adapters on top of TinyLlama. The adapter targets attention projection modules q_proj and v_proj.

Preprocessing

The examples were formatted as chat/instruction-style text for causal language modeling. Tokenization uses the TinyLlama/Llama tokenizer with a maximum model length of 2048 tokens.

Training Hyperparameters

Published adapter configuration:

Training regime: LoRA supervised fine-tuning for causal language modeling
PEFT version: 0.19.1
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Bias: none
Target modules: q_proj, v_proj
Task type: CAUSAL_LM
Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

Other run-specific trainer settings such as exact epoch count, batch size, optimizer, hardware, and wall-clock training time were not published in the model repository.

Speeds, Sizes, Times

Adapter checkpoint size: about 4.5 MB
Base model size: inherited from TinyLlama 1.1B
Training time: not published

Evaluation

Testing Data, Factors, and Metrics

Testing Data

No formal held-out test set was published. The model should be evaluated manually with prompts covering CS, AI, programming, English, Arabic, and mixed English-Arabic use.

Factors

Recommended evaluation factors:

English CS explanation quality
Arabic/Egyptian Arabic clarity
Hallucination rate
Helpfulness for beginner students
Instruction following
Safety and refusal behavior on out-of-scope prompts

Metrics

No benchmark metrics are published. Suggested future metrics include human preference ratings, exactness on small CS QA tests, hallucination checks, and qualitative comparison against the base TinyLlama model.

Results

No formal benchmark results are published yet.

Summary

This model is best understood as a learning milestone and a compact PEFT adapter experiment, not as a production-grade assistant.

Model Examination

No interpretability or internal model examination has been published.

Environmental Impact

Formal carbon accounting was not recorded. Because this is a small LoRA adapter fine-tune on TinyLlama, the compute footprint is expected to be much lower than full-model fine-tuning, but exact emissions cannot be claimed without hardware and runtime logs.

Hardware type: not published
Hours used: not published
Cloud provider: not published
Compute region: not published
Carbon emitted: not calculated

Technical Specifications

Model Architecture and Objective

The base architecture is TinyLlama, a Llama-style causal decoder-only language model. This repository provides a LoRA adapter trained for next-token prediction on instruction/chat-style educational examples.

Compute Infrastructure

The exact training infrastructure is not published in the repository.

Hardware

Not published.

Software

Known software stack from repository metadata:

Transformers
PEFT 0.19.1
TRL
Safetensors
TinyLlama tokenizer/model family

Citation

No paper has been published for this fine-tune. If you use it, cite the repository:

@misc{quantumsquatan_egypt_llm_finetune_2026,
  title = {Egypt LLM Fine-Tune},
  author = {quantumsquatan},
  year = {2026},
  url = {https://huggingface.co/quantumsquatan/egypt-llm-finetune}
}

Glossary

LoRA: Low-Rank Adaptation, a parameter-efficient fine-tuning method.
PEFT: Parameter-Efficient Fine-Tuning.
SFT: Supervised Fine-Tuning.
Adapter: A small set of trainable weights loaded on top of a base model.

More Information

This model is part of a learning project around Hugging Face, LLM fine-tuning, and AI education in Egypt.

Model Card Authors

quantumsquatan, with README/model-card structuring assistance.

Model Card Contact

Use the Hugging Face model page or GitHub repository issues for contact.

Framework Versions

PEFT 0.19.1

Downloads last month: 26

Model tree for quantumsquatan/egypt-llm-finetune

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1528)

this model