Instructions to use AdityaPS/SpaceLLM_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AdityaPS/SpaceLLM_v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b") model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1") - Transformers
How to use AdityaPS/SpaceLLM_v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AdityaPS/SpaceLLM_v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AdityaPS/SpaceLLM_v1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AdityaPS/SpaceLLM_v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AdityaPS/SpaceLLM_v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdityaPS/SpaceLLM_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AdityaPS/SpaceLLM_v1
- SGLang
How to use AdityaPS/SpaceLLM_v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AdityaPS/SpaceLLM_v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdityaPS/SpaceLLM_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AdityaPS/SpaceLLM_v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdityaPS/SpaceLLM_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AdityaPS/SpaceLLM_v1 with Docker Model Runner:
docker model run hf.co/AdityaPS/SpaceLLM_v1
SpaceLLM v1 — LoRA Adapter for Space Domain QA
SpaceLLM v1 is a parameter-efficient LoRA adapter fine-tuned on top of
openai/gpt-oss-20b for space-domain
question answering. Only the lm_head is trained; the full transformer backbone
remains frozen, keeping the adapter extremely lightweight while steering the model's
output distribution toward space mission knowledge.
Model Details
Model Description
- Developed by: AdityaPS
- Model type: LoRA adapter (PEFT) over a causal language model
- Base model: openai/gpt-oss-20b (22B params, BF16/MXFP4)
- Language(s): English
- License: Apache 2.0 (inherited from base model)
- Fine-tuned from: openai/gpt-oss-20b
- PEFT version: 0.19.1
- Fine-tuning strategy: LoRA on
lm_headonly — backbone fully frozen (BF16, NOT QLoRA)
Model Sources
- Repository: AdityaPS/SpaceLLM_v1
Uses
Direct Use
Load alongside openai/gpt-oss-20b for space-domain conversational question answering.
The model expects inputs formatted using the harmony response format (gpt-oss-20b's
required chat template) — passing raw text without the template will degrade output quality.
Downstream Use
Can be plugged into RAG pipelines, mission-planning assistants, or educational tools focused on space science, satellite operations, and related domains.
Out-of-Scope Use
- General-purpose chat without space-domain context
- Tasks requiring multi-modal input (images, structured data)
- Deployment without the base model (
openai/gpt-oss-20bmust be loaded alongside the adapter)
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer, Mxfp4Config
from peft import PeftModel
# Load base model (requires ~44 GB VRAM in BF16, or use MXFP4 for lower memory)
base_model = AutoModelForCausalLM.from_pretrained(
"openai/gpt-oss-20b",
quantization_config=Mxfp4Config(dequantize=True), # dequantizes to BF16
device_map="auto",
trust_remote_code=True,
)
# Load LoRA adapter on top
model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1")
tokenizer = AutoTokenizer.from_pretrained("AdityaPS/SpaceLLM_v1")
# Inference — must use harmony chat template
messages = [
{"role": "system", "content": "You are a space domain expert assistant."},
{"role": "user", "content": "What is the purpose of a Sun-synchronous orbit?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Note:
openai/gpt-oss-20buses the harmony response format. Always usetokenizer.apply_chat_template()— do not pass raw text directly.
Training Details
Training Data
Fine-tuned on an internal space-domain QA dataset (DatasetA_core_QA_v2) consisting
of multi-turn conversational records with system, user, and assistant turns.
Records are tagged with metadata fields including organization, difficulty,
aspect, and chain_id for multi-hop reasoning chains.
| Split | Records |
|---|---|
| Train | ~4,800 |
| Validation | — |
| Test | 5,291 |
Training Procedure
Key Design Choices
- LoRA applied to
lm_headonly — the full MoE transformer backbone is frozen. - Critical fix:
lm_head.weightis physically untied fromembed_tokens.weightviadetach().clone()beforeget_peft_model()is called. Without this, autograd seeslm_headandembed_tokensas the same tensor, cutting gradients tolora_A. - Device-aware CE loss injected to handle MoE multi-GPU sharding where
lm_headmay land on a different device from the labels. - Model loaded in MXFP4 and dequantized to BF16 before LoRA application.
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Training regime | BF16 mixed precision |
| LoRA rank (r) | 32 |
| LoRA alpha | 128 |
| LoRA dropout | 0.1 |
| Target modules | lm_head |
| Learning rate | 2e-4 |
| LR scheduler | cosine with restarts |
| Optimizer | adamw_torch_fused |
| Batch size | 1 |
| Gradient accumulation | 32 (effective batch = 32) |
| Max grad norm | 0.3 |
| Weight decay | 0.01 |
| Warmup steps | 200 |
| Max sequence length | 2,048 |
| Epochs | 5 |
| Early stopping patience | 8 eval steps |
| Vocab size (padded) | 200,064 |
| Hardware | Multi-GPU (cuda:1, cuda:2) |
Evaluation
Testing Data
Evaluation was run on the held-out test split of DatasetA_core_QA_v2
(5,291 records, covering diverse space organizations and difficulty levels).
Metrics
- Loss — mean cross-entropy loss on the assistant response tokens
- Exact Match (EM) — generated answer matches reference exactly (case-insensitive)
- Token F1 — word-overlap F1 between generated and reference answers
- BERTScore — semantic similarity using
roberta-large
Results
BERTScore (roberta-large)
| Metric | Score |
|---|---|
| Precision | 0.8736 |
| Recall | 0.8857 |
| F1 | 0.8795 |
The BERTScore F1 of 0.8795 indicates strong semantic alignment between the model's generated answers and the reference answers across the full test set.
Environmental Impact
Carbon emissions estimated using the Machine Learning Impact calculator (Lacoste et al., 2019).
- Hardware type: NVIDIA multi-GPU (cuda:1, cuda:2)
- Hours used: ~6.6 hours (396.58 min inference; training time not reported)
- Cloud provider: Not applicable (on-premise)
- Compute region: Not reported
- Carbon emitted: Not measured
Technical Specifications
Model Architecture and Objective
- Architecture: Mixture-of-Experts (MoE) causal language model (gpt-oss-20b)
with a LoRA adapter injected at the
lm_headprojection layer - Active parameters during inference: 3.6B (out of 21B total)
- LoRA parameters: ~4 × vocab_size (two low-rank matrices of rank 32, applied to a single linear layer)
- Objective: Next-token prediction with cross-entropy loss, masked so that only assistant response tokens contribute to the loss
Compute Infrastructure
- Training hardware: 2× NVIDIA GPUs (indices 1 and 2), dispatched via
accelerate.dispatch_model - Framework: PyTorch + HuggingFace Transformers + PEFT 0.19.1 + Accelerate
Model Card Authors
AdityaPS
Model Card Contact
[Open an issue or discussion on the HuggingFace repository]
Framework versions
- PEFT 0.19.1
- Downloads last month
- 193
Model tree for AdityaPS/SpaceLLM_v1
Base model
openai/gpt-oss-20b