Instructions to use Harsh-k-007/fitcoach-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Harsh-k-007/fitcoach-3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Harsh-k-007/fitcoach-3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Harsh-k-007/fitcoach-3b")
model = AutoModelForMultimodalLM.from_pretrained("Harsh-k-007/fitcoach-3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Harsh-k-007/fitcoach-3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Harsh-k-007/fitcoach-3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harsh-k-007/fitcoach-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Harsh-k-007/fitcoach-3b

SGLang

How to use Harsh-k-007/fitcoach-3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Harsh-k-007/fitcoach-3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harsh-k-007/fitcoach-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Harsh-k-007/fitcoach-3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harsh-k-007/fitcoach-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Harsh-k-007/fitcoach-3b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Harsh-k-007/fitcoach-3b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Harsh-k-007/fitcoach-3b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Harsh-k-007/fitcoach-3b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Harsh-k-007/fitcoach-3b",
    max_seq_length=2048,
)

Docker Model Runner
How to use Harsh-k-007/fitcoach-3b with Docker Model Runner:
```
docker model run hf.co/Harsh-k-007/fitcoach-3b
```

FitCoach 3B (Merged, fp16)

A fully merged, full-precision (fp16) fine-tune of Llama 3.2 3B Instruct, acting as FitCoach — a conversational fitness and nutrition intake coach. This is the lightweight option in the FitCoach model family, alongside the 8B LoRA adapter.

Try it live: FitCoach Space

Model Details

Base model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit (loaded in 4-bit for training via Unsloth, then merged to 16-bit for deployment)
Format: merged weights, fp16 — no adapter required, load directly
Adapter (during training): LoRA, rank 16, alpha 16, dropout 0, targeting all attention and MLP projection layers (q/k/v/o_proj, gate/up/down_proj)
Training framework: Unsloth FastLanguageModel + TRL SFTTrainer
Training data: Harsh-k-007/fitcoach-conversations — 1,407 synthetic coaching conversations (95/5 train/eval split for this run)
Sequence length: 2048 tokens, with sequence packing (bfd strategy)
Precision: bf16 training on a single T4 GPU (Google Colab free tier)
Epochs: 2, effective batch size 8 (2 × 4 grad accumulation), cosine LR schedule, peak LR 2e-4

Intended Use

FitCoach is a conversational intake coach for fitness and nutrition. Given a user's goal, it asks one question at a time to gather the relevant context, then generates a structured plan.

Scope is intentionally narrow:

Meal plans (~60% of training data): collects goal, age/height/weight, dietary restrictions, activity level
Workout plans (~40% of training data): collects goal, experience level, days per week, equipment access

The 3B model is intended as a lighter, faster alternative to the 8B adapter — useful where latency or memory matters more than maximum response quality.

Out of scope

Injuries, medical conditions, or any medical advice
Macro/calorie arithmetic — the model can describe macro targets conceptually but is not reliable at computing them; treat any numeric macro breakdown as approximate, not verified
Unprompted macro generation — the model does not currently generate macros unless explicitly asked (known dataset gap, planned for v2)

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Harsh-k-007/fitcoach-3b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = "<|finetune_right_pad_id|>"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.float16,
    device_map="auto",
)
model.eval()

messages = [
    {"role": "system", "content": "You are FitCoach, a friendly fitness and nutrition coach."},
    {"role": "user", "content": "Create a simple fat-loss meal plan with Indian food options."},
]

encoded = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
input_ids = encoded["input_ids"] if hasattr(encoded, "keys") else encoded
input_ids = input_ids.to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    pad_token_id=128004,
)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

Training Procedure

Method: Supervised fine-tuning (SFT) with TRL SFTTrainer, using Unsloth's FastLanguageModel for memory-efficient LoRA training (gradient checkpointing via use_gradient_checkpointing="unsloth"), then merged to full precision via save_pretrained_merged(..., save_method="merged_16bit")
Loss: full-conversation loss (train_on_responses_only / assistant-only masking was not applied in this run — a documented future optimization once reliably supported for the Llama 3 chat template)
Chat template: Llama 3.2 (unsloth.chat_templates.get_chat_template)
Optimizer: adamw_8bit, weight decay 0.01, cosine schedule, 17 warmup steps
Hardware: Google Colab T4 (free tier), with Drive checkpointing for resumability across the 90-minute idle / 12-hour session limits

Known Limitations

Macro arithmetic is hallucinated. The model isn't reliable at computing calorie/macro numbers. A v2 release plans to add a calculator/tool layer for this.
Macros aren't generated unprompted. The dataset under-represents this, so the model needs to be asked explicitly. Planned fix for v2 via dataset augmentation.
No assistant-only loss in this training run (see above).
As the smaller model in the family, expect slightly less consistent intake behavior and plan structure compared to the 8B adapter.

Citation

If you use this model, please link back to this repo and the training dataset.

Downloads last month: 96

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for Harsh-k-007/fitcoach-3b

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

unsloth/Llama-3.2-3B-Instruct-bnb-4bit

Finetuned

(257)

this model

Harsh-k-007
/

fitcoach-3b