Instructions to use Miki-T/JARVIS-Mistral-Phase1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Miki-T/JARVIS-Mistral-Phase1b with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
model = PeftModel.from_pretrained(base_model, "Miki-T/JARVIS-Mistral-Phase1b")

Transformers

How to use Miki-T/JARVIS-Mistral-Phase1b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Miki-T/JARVIS-Mistral-Phase1b")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Miki-T/JARVIS-Mistral-Phase1b", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Miki-T/JARVIS-Mistral-Phase1b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Miki-T/JARVIS-Mistral-Phase1b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Miki-T/JARVIS-Mistral-Phase1b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Miki-T/JARVIS-Mistral-Phase1b

SGLang

How to use Miki-T/JARVIS-Mistral-Phase1b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Miki-T/JARVIS-Mistral-Phase1b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Miki-T/JARVIS-Mistral-Phase1b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Miki-T/JARVIS-Mistral-Phase1b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Miki-T/JARVIS-Mistral-Phase1b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Miki-T/JARVIS-Mistral-Phase1b with Docker Model Runner:
```
docker model run hf.co/Miki-T/JARVIS-Mistral-Phase1b
```

JARVIS-Mistral-Phase1b: Macedonian Instruction Following

Model ID: Miki-T/JARVIS-Mistral-Phase1b

A QLoRA fine-tuned Mistral 7B model trained on 134k Macedonian instruction-following examples to teach the model to understand and respond to instructions in Macedonian. This is Phase 1b of the JARVIS training pipeline — a locally-hosted AI assistant inspired by Iron Man's JARVIS.

Model Details

Model Description

Developed by: Miki Trajkovski
Model type: Causal Language Model (fine-tuned via QLoRA)
Base model: mistralai/Mistral-7B-v0.1
Language(s): Macedonian (mk), with English support
License: MIT
Finetuned from model: Miki-T/JARVIS-Mistral-Phase1a (Phase 1a merged adapter)
Adapter type: LoRA (Low-Rank Adaptation)

Model Architecture

Base: Mistral 7B (7 billion parameters)
Fine-tuning method: QLoRA (4-bit quantization + LoRA adapters)
LoRA rank: 8
LoRA alpha: 16
LoRA dropout: 0.05
Target modules: q_proj, v_proj
Max sequence length: 2048 tokens

Model Sources

Repository: https://github.com/MikiTrajkovski/JARVIS
HuggingFace Model Card: https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1b
Previous phase: https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1a

Uses

Direct Use

This model is designed for:

Macedonian instruction following — understands and responds to instructions in Macedonian
Multi-turn conversation — maintains context across dialogue turns
Foundation for downstream fine-tuning — serves as Phase 1b of the JARVIS training pipeline

Example usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(
    "Miki-T/JARVIS-Mistral-Phase1b",
    device_map="auto",
    torch_dtype="auto",
)

model = model.merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1b")

prompt = "[INST] Објасни што е вештачка интелигенција на едноставен начин. [/INST]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Out-of-Scope Use

Not for production: This is a research/learning model
Not domain-specific: Not trained on legal, medical, or technical Macedonian
Not a chat assistant: No personality layer applied at this phase

Limitations and Bias

Known Limitations

Phase 1b teaches instruction following — complex multi-step reasoning may still be limited
Training data reflects biases present in source instruction datasets
Context window: 2048 tokens max
Multi-turn training covers up to 14 turns — longer dialogues may degrade

Recommendations

Always validate outputs for factual accuracy
Not suitable for specialized domain tasks without further fine-tuning

Training Details

Training Data

Dataset	Rows	Source	Purpose
LVSTCK/sft-mk	113,000	HuggingFace	Macedonian instruction-response pairs
LVSTCK/ultrachat-sft-mk	16,200	HuggingFace	Multi-turn Macedonian conversations
LVSTCK/Open-Platypus-MK	5,040	HuggingFace	High-quality reasoning and instruction pairs
Total	134,240

Data format: Instruction-response pairs (JSONL), multi-turn conversations
Language: Macedonian (Cyrillic script), with English mixed in Open-Platypus
Multi-turn average: 6.21 turns per conversation (max 14 turns)
Training mode: completion_only_loss=True — model trained on response tokens only

Hyperparameters

Parameter	Value
Learning rate	6e-5
Warmup ratio	5%
Learning rate scheduler	Cosine decay
Batch size	2
Gradient accumulation	4
Epochs	2
Optimization	AdamW (8-bit)
Gradient checkpointing	Disabled

Training Regime

Hardware: NVIDIA RTX 5070 (12GB VRAM)
Framework: PyTorch + Hugging Face Transformers
Fine-tuning framework: TRL 1.7.0 SFTTrainer + PEFT LoRA
Precision: 4-bit quantization (NF4) + bfloat16 math

Speeds, Sizes, Times

Metric	Value
Training duration	4 days, 20 hours, 2 minutes
Total steps	15,498 (2 epochs)
Throughput	~91 tokens/second
Adapter size	~84 MB
Total VRAM used	~4.8 GB / 12 GB
Total tokens processed	38.2M tokens

Evaluation

Metrics

Metric	Epoch 1	Epoch 2	Final
Loss	0.7157	0.6227	0.4844
Perplexity	2.05	1.86	1.62
Gradient norm (avg)	—	—	1.246
Gradient norm (max)	—	—	3.141

Starting loss: 0.9148
Final loss: 0.4844
Best loss: 0.4400
Total improvement: 47%

Sample Output

Prompt: [INST] Напиши кратко резиме за Скопје. [/INST]

Output: Скопје е главниот и најголемиот град на Северна Македонија. Се наоѓа во центарот на земјата, на реката Вардар...

Interpretation: Follows instruction format, responds in Macedonian, maintains topic correctly.

Model Card Details

Environmental Impact

Factor	Value
Hardware	NVIDIA RTX 5070 (12GB VRAM)
Training duration	4 days, 20 hours
Power consumption (estimated)	~150W x 116 hours = 17.4 kWh
Carbon emitted (estimated)	~8-12 kg CO2e
Cloud provider	None (local desktop GPU)

Compute Infrastructure

CPU: AMD Ryzen 7 7800X3D (8-core)
GPU: NVIDIA RTX 5070 (12GB GDDR6X VRAM)
RAM: 32GB DDR5
OS: Windows 11
CUDA: 12.x

Software

PyTorch: 2.7.0+cu128
Transformers: 4.40.0
PEFT: 0.10.0
TRL: 1.7.0
Accelerate: 0.29.0
Bitsandbytes: 0.43.0

How to Use

Load the Model

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch

model = AutoPeftModelForCausalLM.from_pretrained(
    "Miki-T/JARVIS-Mistral-Phase1b",
    device_map="auto",
    torch_dtype=torch.float16,
)

model = model.merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained("Miki-T/JARVIS-Mistral-Phase1b")

Generate Text

prompt = "[INST] Објасни ми ги предностите на соларната енергија. [/INST]"
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
input_ids = inputs["input_ids"].to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=200,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

Citation

@misc{trajkovski2026jarvis_phase1b,
  author = {Trajkovski, Miki},
  title = {JARVIS: Macedonian Instruction Following (Phase 1b)},
  year = {2026},
  publisher = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.co/Miki-T/JARVIS-Mistral-Phase1b}},
}

Acknowledgments

Base model: Mistral AI (Mistral 7B v0.1)
Fine-tuning: Hugging Face TRL + PEFT libraries
Data: LVSTCK (Nikola Dobrota) — Macedonian NLP datasets
Inspiration: Tony Stark's JARVIS from Marvel

License

This model is provided under the MIT License, same as the JARVIS project.

Model Card Contact

Author: Miki Trajkovski
GitHub: https://github.com/MikiTrajkovski/JARVIS
HuggingFace: https://huggingface.co/Miki-T

Framework Versions

PEFT: 0.10.0
Transformers: 4.40.0
PyTorch: 2.7.0+cu128
TRL: 1.7.0
CUDA: 12.x

Downloads last month: 15

Model tree for Miki-T/JARVIS-Mistral-Phase1b

Base model

mistralai/Mistral-7B-v0.1

Adapter

(2474)

this model

Miki-T
/

JARVIS-Mistral-Phase1b