sf-diogenes-v0.1 / README.md

tilman-d

Update README.md

5710876 verified 2 days ago

preview code

raw

history blame contribute delete

8.63 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
language:
  - en
pipeline_tag: text-generation
library_name: transformers
tags:
  - lora
  - salesforce
  - instruction-tuning
  - qwen3

sf-diogenes-v0.1

TL;DR

80B-parameter Qwen3 Next Instruct model merged with a domain LoRA adapter for Salesforce analytics, Data Cloud, Commerce, and Order Management workflows.
Trained on 102,827 curated instruction/response pairs stored in finetune_dataset.jsonl (≈151 MiB).
Distributed as merged full weights (bf16) so it can be loaded like any other Transformers causal LM without applying the adapter at runtime.

Model Details

Field	Value
Base	`Qwen/Qwen3-Next-80B-A3B-Instruct`
Architecture	`Qwen3NextForCausalLM` (Mixture-of-Experts with alternating linear/full attention)
Parameters	~80B (same as base; LoRA deltas merged)
Context Length	262,144 tokens (see `config.json`)
Precision	bfloat16
Tokenizer	Qwen 151,936 vocab with chat template in `sf-diogenes-v0.1/chat_template.jinja`
Finetuning	Supervised fine-tuning with LoRA → merged back into the base weights
Libraries	`transformers==4.57.1`, `peft`, `accelerate`

Prompt Template

Generation follows the Qwen3 template with <|im_start|>role markers. Each training record began with the system message above, a single user question, and an assistant reply. When crafting new prompts, either rely on tokenizer.apply_chat_template or format manually:

<|im_start|>system
You are a helpful Salesforce analytics assistant.<|im_end|>
<|im_start|>user
<user question><|im_end|>
<|im_start|>assistant

Dataset: `finetune_dataset.jsonl`

Metric	Value
Records	102,827 JSONL rows
Size	158,187,480 bytes (≈151 MiB)
Fields	`text`, `prompt`, `completion`
Domain	Salesforce Forecasting, Data Cloud, Commerce, Order Management, CLM, Payments, Einstein recommendations
Structure	System primer → user instruction → multi-step assistant answer

Sample row:

{
  "prompt": "How can I view the Payments Administrator permissions and choose a setup approach for Salesforce Payments?",
  "completion": "1) From Setup, Quick Find, enter Permission Sets, and click the Payments Administrator permission set to view the System Permissions. 2) Review the System Permissions in that permission set. 3) Use the Commerce Setup Assistant (recommended) for an easy, guided setup, or use the Setup menu for manual configuration. 4) To use Pay Now on its own, a standalone Payments license is required."
}

Observations:

Every text field already contains the fully formatted conversation, so the prompt/completion split was used to mask losses on user content.
Content is English-only and keeps product names and feature terminology verbatim from Salesforce documentation and release notes. No customer-specific or PII entries were detected during sampling, but please review before sharing externally.
No augmentation, paraphrasing, or rejection sampling logs are stored in the repo; the dataset is used as-is in chronological order.

Training Procedure

Pre-processing – Each JSONL row is converted to tokens through the bundled Qwen3 chat template. Only the assistant completion is used as the supervised target; prompt tokens are masked.
LoRA fine-tuning – LoRA adapters were attached to the base attention and feed-forward projections (standard PEFT setup). Training used bf16 activations with gradient checkpointing to keep memory pressure manageable on 80B weights. (Adapter hyperparameters live in the training run artifacts; update this section if you change them.)
Merge – After convergence the adapter was merged into the base weights (peft.LoraModel.merge_and_unload) to produce the standalone folder sf-diogenes-v0.1/. This keeps inference simple at the cost of a larger download.

Reproducing the Adapter

from datasets import load_dataset
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Instruct", trust_remote_code=True)
ds = load_dataset("json", data_files="finetune_dataset.jsonl", split="train")

def format_example(example):
    messages = [
        {"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
        {"role": "user", "content": example["prompt"]},
        {"role": "assistant", "content": example["completion"]},
    ]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
    tokenized = tokenizer(text, truncation=True, max_length=8192)
    labels = tokenized["input_ids"][:]
    # mask user tokens here...
    tokenized["labels"] = labels
    return tokenized

tokenized_ds = ds.map(format_example, remove_columns=ds.column_names)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-Next-80B-A3B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
    trust_remote_code=True,
)
model = get_peft_model(model, LoraConfig(r=??, lora_alpha=??, lora_dropout=0.05))

trainer = Trainer(
    model=model,
    train_dataset=tokenized_ds,
    args=TrainingArguments(
        output_dir="diogenes-lora",
        num_train_epochs=1,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=16,
        learning_rate=1e-4,
        bf16=True,
        logging_steps=10,
        save_steps=200,
    ),
)
trainer.train()

Fill in the LoRA rank/alpha you used; they are omitted here because the original adapter config is not stored in the repo.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "urm3l/sf-diogenes-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto", trust_remote_code=True)

messages = [
    {"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
    {"role": "user", "content": "How do I keep Forecast Amount aligned with Amount Without Manager Adjustments?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Inference Tips

Responses are optimized for structured, enumerated instructions. Use lower temperatures (0.1–0.3) to maintain factual tone.
Context windows above ~8k tokens work, but throughput falls sharply; prune long histories when possible.
Quantization (e.g., GPTQ or AWQ) is not provided here—quantize downstream if needed.

Evaluation

Formal benchmarks (MT-Bench, AlpacaEval, etc.) have not yet been run since the focus was on Salesforce internal workflows. Please evaluate against:

Your organization’s Salesforce help-desk transcripts.
Synthetic scenario suites covering Forecasting, Commerce, Data Cloud, and CLM features.
Hallucination and safety probes for financial/PII content.

Kindly share scores or qualitative findings back via issues so the card can be updated.

Limitations & Risks

Domain narrowness – The model was never exposed to open-ended general knowledge tasks; off-domain prompts can hallucinate confidently.
Freshness – Data reflects the state of Salesforce documentation as of Oct 2025. Features introduced later may be missing or inaccurate.
Compliance – The dataset embeds Salesforce language; ensure redistribution complies with your documentation licenses.
Resource requirements – Running the full bf16 weights requires multi-GPU setups (≥160 GB VRAM) unless you quantize.

Responsible Use

Keep humans in the loop for contractual, legal, or financial decisions.
Filter prompts and outputs for sensitive data before logging or sharing.
When surfacing responses to end users, provide citations back to the authoritative Salesforce documentation or run retrieval-augmented generation to ground answers.

Maintenance & Support

Owner: urm3l
Artifacts: sf-diogenes-v0.1/ (merged weights), finetune_dataset.jsonl (training data)
Next steps: add quantitative evals, document adapter hyperparameters, and consider releasing a quantized variant.

Last updated: 2025-11-02