Text Generation
PEFT
Safetensors
Transformers
Macedonian
lora
sft
trl

JARVIS-Mistral-Phase1b: Macedonian Instruction Following

Model ID: Miki-T/JARVIS-Mistral-Phase1b

A QLoRA fine-tuned Mistral 7B model trained on 134k Macedonian instruction-following examples to teach the model to understand and respond to instructions in Macedonian. This is Phase 1b of the JARVIS training pipeline — a locally-hosted AI assistant inspired by Iron Man's JARVIS.


Model Details

Model Description

  • Developed by: Miki Trajkovski
  • Model type: Causal Language Model (fine-tuned via QLoRA)
  • Base model: mistralai/Mistral-7B-v0.1
  • Language(s): Macedonian (mk), with English support
  • License: MIT
  • Finetuned from model: Miki-T/JARVIS-Mistral-Phase1a (Phase 1a merged adapter)
  • Adapter type: LoRA (Low-Rank Adaptation)

Model Architecture

  • Base: Mistral 7B (7 billion parameters)
  • Fine-tuning method: QLoRA (4-bit quantization + LoRA adapters)
  • LoRA rank: 8
  • LoRA alpha: 16
  • LoRA dropout: 0.05
  • Target modules: q_proj, v_proj
  • Max sequence length: 2048 tokens

Model Sources


Uses

Direct Use

This model is designed for:

  • Macedonian instruction following — understands and responds to instructions in Macedonian
  • Multi-turn conversation — maintains context across dialogue turns
  • Foundation for downstream fine-tuning — serves as Phase 1b of the JARVIS training pipeline

Example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(
    "Miki-T/JARVIS-Mistral-Phase1b",
    device_map="auto",
    torch_dtype="auto",
)

model = model.merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1b")

prompt = "[INST] Објасни што е вештачка интелигенција на едноставен начин. [/INST]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Out-of-Scope Use

  • Not for production: This is a research/learning model
  • Not domain-specific: Not trained on legal, medical, or technical Macedonian
  • Not a chat assistant: No personality layer applied at this phase

Limitations and Bias

Known Limitations

  • Phase 1b teaches instruction following — complex multi-step reasoning may still be limited
  • Training data reflects biases present in source instruction datasets
  • Context window: 2048 tokens max
  • Multi-turn training covers up to 14 turns — longer dialogues may degrade

Recommendations

  • Always validate outputs for factual accuracy
  • Not suitable for specialized domain tasks without further fine-tuning

Training Details

Training Data

Dataset Rows Source Purpose
LVSTCK/sft-mk 113,000 HuggingFace Macedonian instruction-response pairs
LVSTCK/ultrachat-sft-mk 16,200 HuggingFace Multi-turn Macedonian conversations
LVSTCK/Open-Platypus-MK 5,040 HuggingFace High-quality reasoning and instruction pairs
Total 134,240
  • Data format: Instruction-response pairs (JSONL), multi-turn conversations
  • Language: Macedonian (Cyrillic script), with English mixed in Open-Platypus
  • Multi-turn average: 6.21 turns per conversation (max 14 turns)
  • Training mode: completion_only_loss=True — model trained on response tokens only

Hyperparameters

Parameter Value
Learning rate 6e-5
Warmup ratio 5%
Learning rate scheduler Cosine decay
Batch size 2
Gradient accumulation 4
Epochs 2
Optimization AdamW (8-bit)
Gradient checkpointing Disabled

Training Regime

  • Hardware: NVIDIA RTX 5070 (12GB VRAM)
  • Framework: PyTorch + Hugging Face Transformers
  • Fine-tuning framework: TRL 1.7.0 SFTTrainer + PEFT LoRA
  • Precision: 4-bit quantization (NF4) + bfloat16 math

Speeds, Sizes, Times

Metric Value
Training duration 4 days, 20 hours, 2 minutes
Total steps 15,498 (2 epochs)
Throughput ~91 tokens/second
Adapter size ~84 MB
Total VRAM used ~4.8 GB / 12 GB
Total tokens processed 38.2M tokens

Evaluation

Metrics

Metric Epoch 1 Epoch 2 Final
Loss 0.7157 0.6227 0.4844
Perplexity 2.05 1.86 1.62
Gradient norm (avg) 1.246
Gradient norm (max) 3.141
  • Starting loss: 0.9148
  • Final loss: 0.4844
  • Best loss: 0.4400
  • Total improvement: 47%

Sample Output

Prompt: [INST] Напиши кратко резиме за Скопје. [/INST]

Output: Скопје е главниот и најголемиот град на Северна Македонија. Се наоѓа во центарот на земјата, на реката Вардар...

Interpretation: Follows instruction format, responds in Macedonian, maintains topic correctly.


Model Card Details

Environmental Impact

Factor Value
Hardware NVIDIA RTX 5070 (12GB VRAM)
Training duration 4 days, 20 hours
Power consumption (estimated) ~150W x 116 hours = 17.4 kWh
Carbon emitted (estimated) ~8-12 kg CO2e
Cloud provider None (local desktop GPU)

Compute Infrastructure

  • CPU: AMD Ryzen 7 7800X3D (8-core)
  • GPU: NVIDIA RTX 5070 (12GB GDDR6X VRAM)
  • RAM: 32GB DDR5
  • OS: Windows 11
  • CUDA: 12.x

Software

  • PyTorch: 2.7.0+cu128
  • Transformers: 4.40.0
  • PEFT: 0.10.0
  • TRL: 1.7.0
  • Accelerate: 0.29.0
  • Bitsandbytes: 0.43.0

How to Use

Load the Model

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch

model = AutoPeftModelForCausalLM.from_pretrained(
    "Miki-T/JARVIS-Mistral-Phase1b",
    device_map="auto",
    torch_dtype=torch.float16,
)

model = model.merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1b")

Generate Text

prompt = "[INST] Објасни ми ги предностите на соларната енергија. [/INST]"
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
input_ids = inputs["input_ids"].to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=200,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Citation

@misc{trajkovski2026jarvis_phase1b,
  author = {Trajkovski, Miki},
  title = {JARVIS: Macedonian Instruction Following (Phase 1b)},
  year = {2026},
  publisher = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1b}},
}

Acknowledgments

  • Base model: Mistral AI (Mistral 7B v0.1)
  • Fine-tuning: Hugging Face TRL + PEFT libraries
  • Data: LVSTCK (Nikola Dobrota) — Macedonian NLP datasets
  • Inspiration: Tony Stark's JARVIS from Marvel

License

This model is provided under the MIT License, same as the JARVIS project.


Model Card Contact


Framework Versions

  • PEFT: 0.10.0
  • Transformers: 4.40.0
  • PyTorch: 2.7.0+cu128
  • TRL: 1.7.0
  • CUDA: 12.x
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Miki-T/JARVIS-Mistral-Phase1b

Adapter
(2474)
this model

Datasets used to train Miki-T/JARVIS-Mistral-Phase1b