Instructions to use AdityaPS/SpaceLLM_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AdityaPS/SpaceLLM_v1 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b")
model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1")

Transformers

How to use AdityaPS/SpaceLLM_v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AdityaPS/SpaceLLM_v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("AdityaPS/SpaceLLM_v1", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AdityaPS/SpaceLLM_v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AdityaPS/SpaceLLM_v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AdityaPS/SpaceLLM_v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AdityaPS/SpaceLLM_v1

SGLang

How to use AdityaPS/SpaceLLM_v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AdityaPS/SpaceLLM_v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AdityaPS/SpaceLLM_v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AdityaPS/SpaceLLM_v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AdityaPS/SpaceLLM_v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AdityaPS/SpaceLLM_v1 with Docker Model Runner:
```
docker model run hf.co/AdityaPS/SpaceLLM_v1
```

SpaceLLM v1 — LoRA Adapter for Space Domain QA

SpaceLLM v1 is a parameter-efficient LoRA adapter fine-tuned on top of openai/gpt-oss-20b for space-domain question answering. Only the lm_head is trained; the full transformer backbone remains frozen, keeping the adapter extremely lightweight while steering the model's output distribution toward space mission knowledge.

Model Details

Model Description

Developed by: AdityaPS
Model type: LoRA adapter (PEFT) over a causal language model
Base model: openai/gpt-oss-20b (22B params, BF16/MXFP4)
Language(s): English
License: Apache 2.0 (inherited from base model)
Fine-tuned from: openai/gpt-oss-20b
PEFT version: 0.19.1
Fine-tuning strategy: LoRA on lm_head only — backbone fully frozen (BF16, NOT QLoRA)

Model Sources

Repository: AdityaPS/SpaceLLM_v1

Uses

Direct Use

Load alongside openai/gpt-oss-20b for space-domain conversational question answering. The model expects inputs formatted using the harmony response format (gpt-oss-20b's required chat template) — passing raw text without the template will degrade output quality.

Downstream Use

Can be plugged into RAG pipelines, mission-planning assistants, or educational tools focused on space science, satellite operations, and related domains.

Out-of-Scope Use

General-purpose chat without space-domain context
Tasks requiring multi-modal input (images, structured data)
Deployment without the base model (openai/gpt-oss-20b must be loaded alongside the adapter)

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer, Mxfp4Config
from peft import PeftModel

# Load base model (requires ~44 GB VRAM in BF16, or use MXFP4 for lower memory)
base_model = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-20b",
    quantization_config=Mxfp4Config(dequantize=True),  # dequantizes to BF16
    device_map="auto",
    trust_remote_code=True,
)

# Load LoRA adapter on top
model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1")
tokenizer = AutoTokenizer.from_pretrained("AdityaPS/SpaceLLM_v1")

# Inference — must use harmony chat template
messages = [
    {"role": "system", "content": "You are a space domain expert assistant."},
    {"role": "user",   "content": "What is the purpose of a Sun-synchronous orbit?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Note: openai/gpt-oss-20b uses the harmony response format. Always use tokenizer.apply_chat_template() — do not pass raw text directly.

Training Details

Training Data

Fine-tuned on an internal space-domain QA dataset (DatasetA_core_QA_v2) consisting of multi-turn conversational records with system, user, and assistant turns. Records are tagged with metadata fields including organization, difficulty, aspect, and chain_id for multi-hop reasoning chains.

Split	Records
Train	~4,800
Validation	—
Test	5,291

Training Procedure

Key Design Choices

LoRA applied to lm_head only — the full MoE transformer backbone is frozen.
Critical fix: lm_head.weight is physically untied from embed_tokens.weight via detach().clone() before get_peft_model() is called. Without this, autograd sees lm_head and embed_tokens as the same tensor, cutting gradients to lora_A.
Device-aware CE loss injected to handle MoE multi-GPU sharding where lm_head may land on a different device from the labels.
Model loaded in MXFP4 and dequantized to BF16 before LoRA application.

Training Hyperparameters

Hyperparameter	Value
Training regime	BF16 mixed precision
LoRA rank (r)	32
LoRA alpha	128
LoRA dropout	0.1
Target modules	`lm_head`
Learning rate	2e-4
LR scheduler	cosine with restarts
Optimizer	adamw_torch_fused
Batch size	1
Gradient accumulation	32 (effective batch = 32)
Max grad norm	0.3
Weight decay	0.01
Warmup steps	200
Max sequence length	2,048
Epochs	5
Early stopping patience	8 eval steps
Vocab size (padded)	200,064
Hardware	Multi-GPU (cuda:1, cuda:2)

Evaluation

Testing Data

Evaluation was run on the held-out test split of DatasetA_core_QA_v2 (5,291 records, covering diverse space organizations and difficulty levels).

Metrics

Loss — mean cross-entropy loss on the assistant response tokens
Exact Match (EM) — generated answer matches reference exactly (case-insensitive)
Token F1 — word-overlap F1 between generated and reference answers
BERTScore — semantic similarity using roberta-large

Results

BERTScore (`roberta-large`)

Metric	Score
Precision	0.8736
Recall	0.8857
F1	0.8795

The BERTScore F1 of 0.8795 indicates strong semantic alignment between the model's generated answers and the reference answers across the full test set.

Environmental Impact

Carbon emissions estimated using the Machine Learning Impact calculator (Lacoste et al., 2019).

Hardware type: NVIDIA multi-GPU (cuda:1, cuda:2)
Hours used: ~6.6 hours (396.58 min inference; training time not reported)
Cloud provider: Not applicable (on-premise)
Compute region: Not reported
Carbon emitted: Not measured

Technical Specifications

Model Architecture and Objective

Architecture: Mixture-of-Experts (MoE) causal language model (gpt-oss-20b) with a LoRA adapter injected at the lm_head projection layer
Active parameters during inference: 3.6B (out of 21B total)
LoRA parameters: ~4 × vocab_size (two low-rank matrices of rank 32, applied to a single linear layer)
Objective: Next-token prediction with cross-entropy loss, masked so that only assistant response tokens contribute to the loss

Compute Infrastructure

Training hardware: 2× NVIDIA GPUs (indices 1 and 2), dispatched via accelerate.dispatch_model
Framework: PyTorch + HuggingFace Transformers + PEFT 0.19.1 + Accelerate

Model Card Authors

AdityaPS

Model Card Contact

[Open an issue or discussion on the HuggingFace repository]

Framework versions

PEFT 0.19.1

Downloads last month: 193

Model tree for AdityaPS/SpaceLLM_v1

Base model

openai/gpt-oss-20b

Adapter

(238)

this model