metadata
license: apache-2.0
base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- lora
- salesforce
- instruction-tuning
- qwen3
sf-diogenes-v0.1
TL;DR
- 80B-parameter Qwen3 Next Instruct model merged with a domain LoRA adapter for Salesforce analytics, Data Cloud, Commerce, and Order Management workflows.
- Trained on 102,827 curated instruction/response pairs stored in
finetune_dataset.jsonl(≈151 MiB). - Distributed as merged full weights (bf16) so it can be loaded like any other Transformers causal LM without applying the adapter at runtime.
Model Details
| Field | Value |
|---|---|
| Base | Qwen/Qwen3-Next-80B-A3B-Instruct |
| Architecture | Qwen3NextForCausalLM (Mixture-of-Experts with alternating linear/full attention) |
| Parameters | ~80B (same as base; LoRA deltas merged) |
| Context Length | 262,144 tokens (see config.json) |
| Precision | bfloat16 |
| Tokenizer | Qwen 151,936 vocab with chat template in sf-diogenes-v0.1/chat_template.jinja |
| Finetuning | Supervised fine-tuning with LoRA → merged back into the base weights |
| Libraries | transformers==4.57.1, peft, accelerate |
Prompt Template
Generation follows the Qwen3 template with <|im_start|>role markers. Each training record began with the system message above, a single user question, and an assistant reply. When crafting new prompts, either rely on tokenizer.apply_chat_template or format manually:
<|im_start|>system
You are a helpful Salesforce analytics assistant.<|im_end|>
<|im_start|>user
<user question><|im_end|>
<|im_start|>assistant
Dataset: finetune_dataset.jsonl
| Metric | Value |
|---|---|
| Records | 102,827 JSONL rows |
| Size | 158,187,480 bytes (≈151 MiB) |
| Fields | text, prompt, completion |
| Domain | Salesforce Forecasting, Data Cloud, Commerce, Order Management, CLM, Payments, Einstein recommendations |
| Structure | System primer → user instruction → multi-step assistant answer |
Sample row:
{
"prompt": "How can I view the Payments Administrator permissions and choose a setup approach for Salesforce Payments?",
"completion": "1) From Setup, Quick Find, enter Permission Sets, and click the Payments Administrator permission set to view the System Permissions. 2) Review the System Permissions in that permission set. 3) Use the Commerce Setup Assistant (recommended) for an easy, guided setup, or use the Setup menu for manual configuration. 4) To use Pay Now on its own, a standalone Payments license is required."
}
Observations:
- Every
textfield already contains the fully formatted conversation, so theprompt/completionsplit was used to mask losses on user content. - Content is English-only and keeps product names and feature terminology verbatim from Salesforce documentation and release notes. No customer-specific or PII entries were detected during sampling, but please review before sharing externally.
- No augmentation, paraphrasing, or rejection sampling logs are stored in the repo; the dataset is used as-is in chronological order.
Training Procedure
- Pre-processing – Each JSONL row is converted to tokens through the bundled Qwen3 chat template. Only the assistant completion is used as the supervised target; prompt tokens are masked.
- LoRA fine-tuning – LoRA adapters were attached to the base attention and feed-forward projections (standard PEFT setup). Training used bf16 activations with gradient checkpointing to keep memory pressure manageable on 80B weights. (Adapter hyperparameters live in the training run artifacts; update this section if you change them.)
- Merge – After convergence the adapter was merged into the base weights (
peft.LoraModel.merge_and_unload) to produce the standalone foldersf-diogenes-v0.1/. This keeps inference simple at the cost of a larger download.
Reproducing the Adapter
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Instruct", trust_remote_code=True)
ds = load_dataset("json", data_files="finetune_dataset.jsonl", split="train")
def format_example(example):
messages = [
{"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
{"role": "user", "content": example["prompt"]},
{"role": "assistant", "content": example["completion"]},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
tokenized = tokenizer(text, truncation=True, max_length=8192)
labels = tokenized["input_ids"][:]
# mask user tokens here...
tokenized["labels"] = labels
return tokenized
tokenized_ds = ds.map(format_example, remove_columns=ds.column_names)
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-Next-80B-A3B-Instruct",
torch_dtype="bfloat16",
device_map="auto",
trust_remote_code=True,
)
model = get_peft_model(model, LoraConfig(r=??, lora_alpha=??, lora_dropout=0.05))
trainer = Trainer(
model=model,
train_dataset=tokenized_ds,
args=TrainingArguments(
output_dir="diogenes-lora",
num_train_epochs=1,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
learning_rate=1e-4,
bf16=True,
logging_steps=10,
save_steps=200,
),
)
trainer.train()
Fill in the LoRA rank/alpha you used; they are omitted here because the original adapter config is not stored in the repo.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "urm3l/sf-diogenes-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto", trust_remote_code=True)
messages = [
{"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
{"role": "user", "content": "How do I keep Forecast Amount aligned with Amount Without Manager Adjustments?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference Tips
- Responses are optimized for structured, enumerated instructions. Use lower temperatures (0.1–0.3) to maintain factual tone.
- Context windows above ~8k tokens work, but throughput falls sharply; prune long histories when possible.
- Quantization (e.g., GPTQ or AWQ) is not provided here—quantize downstream if needed.
Evaluation
Formal benchmarks (MT-Bench, AlpacaEval, etc.) have not yet been run since the focus was on Salesforce internal workflows. Please evaluate against:
- Your organization’s Salesforce help-desk transcripts.
- Synthetic scenario suites covering Forecasting, Commerce, Data Cloud, and CLM features.
- Hallucination and safety probes for financial/PII content.
Kindly share scores or qualitative findings back via issues so the card can be updated.
Limitations & Risks
- Domain narrowness – The model was never exposed to open-ended general knowledge tasks; off-domain prompts can hallucinate confidently.
- Freshness – Data reflects the state of Salesforce documentation as of Oct 2025. Features introduced later may be missing or inaccurate.
- Compliance – The dataset embeds Salesforce language; ensure redistribution complies with your documentation licenses.
- Resource requirements – Running the full bf16 weights requires multi-GPU setups (≥160 GB VRAM) unless you quantize.
Responsible Use
- Keep humans in the loop for contractual, legal, or financial decisions.
- Filter prompts and outputs for sensitive data before logging or sharing.
- When surfacing responses to end users, provide citations back to the authoritative Salesforce documentation or run retrieval-augmented generation to ground answers.
Maintenance & Support
- Owner:
urm3l - Artifacts:
sf-diogenes-v0.1/(merged weights),finetune_dataset.jsonl(training data) - Next steps: add quantitative evals, document adapter hyperparameters, and consider releasing a quantized variant.
Last updated: 2025-11-02