tilman-d
/

sf-diogenes-v0.1

@@ -1,73 +1,185 @@
 ---
 license: apache-2.0
 base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
 tags:
-- merge
 - lora
 - qwen3
-- instruct
-library_name: transformers
 ---
 # sf-diogenes-v0.1
-This model is a merge of the [Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct) base model with the [urm3l/diogenes-v.01-lora-adapter](https://huggingface.co/urm3l/diogenes-v.01-lora-adapter) LoRA adapter.
 ## Model Details
-- **Base Model**: Qwen/Qwen3-Next-80B-A3B-Instruct
-- **LoRA Adapter**: urm3l/diogenes-v.01-lora-adapter
-- **Merge Date**: 2025-11-01
-- **Model Size**: ~80B parameters
-- **Precision**: BF16
-## Usage
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
-    "urm3l/sf-diogenes-v0.1",
-    torch_dtype="auto",
     device_map="auto",
-    trust_remote_code=True
 )
-tokenizer = AutoTokenizer.from_pretrained(
-    "urm3l/sf-diogenes-v0.1",
-    trust_remote_code=True
 )
-messages = [
-    {"role": "system", "content": "You are a helpful assistant."},
-    {"role": "user", "content": "Hello!"}
-]
-text = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=512)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
-## Training Details
-This model was fine-tuned using LoRA (Low-Rank Adaptation). For training details, see the [adapter repository](https://huggingface.co/urm3l/diogenes-v.01-lora-adapter).
-## Limitations
-- This is a large language model and may produce incorrect or biased outputs
-- Should not be used for high-stakes decision making without human oversight
-- May require significant computational resources for inference
-## License
-This model inherits the license from the base model: Apache 2.0
-## Citation
-If you use this model, please cite the original Qwen3 paper and acknowledge the LoRA fine-tuning.

 ---
 license: apache-2.0
 base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
 tags:
 - lora
+- salesforce
+- instruction-tuning
 - qwen3
 ---
 # sf-diogenes-v0.1
+## TL;DR
+- 80B-parameter Qwen3 Next Instruct model merged with a domain LoRA adapter for Salesforce analytics, Data Cloud, Commerce, and Order Management workflows.
+- Trained on 102,827 curated instruction/response pairs stored in `finetune_dataset.jsonl` (≈151 MiB) with a fixed system primer _“You are a helpful Salesforce analytics assistant.”_
+- Distributed as merged full weights (bf16) so it can be loaded like any other Transformers causal LM without applying the adapter at runtime.
 ## Model Details
+| Field | Value |
+| --- | --- |
+| **Base** | [`Qwen/Qwen3-Next-80B-A3B-Instruct`](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct) |
+| **Architecture** | `Qwen3NextForCausalLM` (Mixture-of-Experts with alternating linear/full attention) |
+| **Parameters** | ~80B (same as base; LoRA deltas merged) |
+| **Context Length** | 262,144 tokens (see `config.json`) |
+| **Precision** | bfloat16 |
+| **Tokenizer** | Qwen 151,936 vocab with chat template in `sf-diogenes-v0.1/chat_template.jinja` |
+| **Finetuning** | Supervised fine-tuning with LoRA → merged back into the base weights |
+| **Libraries** | `transformers==4.57.1`, `peft`, `accelerate` |
+### Prompt Template
+Generation follows the Qwen3 template with `<|im_start|>role` markers. Each training record began with the system message above, a single user question, and an assistant reply. When crafting new prompts, either rely on `tokenizer.apply_chat_template` or format manually:
+```
+<|im_start|>system
+You are a helpful Salesforce analytics assistant.<|im_end|>
+<|im_start|>user
+<user question><|im_end|>
+<|im_start|>assistant
+```
+## Dataset: `finetune_dataset.jsonl`
+| Metric | Value |
+| --- | --- |
+| Records | 102,827 JSONL rows |
+| Size | 158,187,480 bytes (≈151 MiB) |
+| Fields | `text`, `prompt`, `completion` |
+| Domain | Salesforce Forecasting, Data Cloud, Commerce, Order Management, CLM, Payments, Einstein recommendations |
+| Structure | System primer → user instruction → multi-step assistant answer |
+Sample row:
+```json
+{
+  "prompt": "How can I view the Payments Administrator permissions and choose a setup approach for Salesforce Payments?",
+  "completion": "1) From Setup, Quick Find, enter Permission Sets, and click the Payments Administrator permission set to view the System Permissions. 2) Review the System Permissions in that permission set. 3) Use the Commerce Setup Assistant (recommended) for an easy, guided setup, or use the Setup menu for manual configuration. 4) To use Pay Now on its own, a standalone Payments license is required."
+}
+```
+Observations:
+- Every `text` field already contains the fully formatted conversation, so the `prompt`/`completion` split was used to mask losses on user content.
+- Content is English-only and keeps product names and feature terminology verbatim from Salesforce documentation and release notes. No customer-specific or PII entries were detected during sampling, but please review before sharing externally.
+- No augmentation, paraphrasing, or rejection sampling logs are stored in the repo; the dataset is used as-is in chronological order.
+## Training Procedure
+1. **Pre-processing** – Each JSONL row is converted to tokens through the bundled Qwen3 chat template. Only the assistant completion is used as the supervised target; prompt tokens are masked.
+2. **LoRA fine-tuning** – LoRA adapters were attached to the base attention and feed-forward projections (standard PEFT setup). Training used bf16 activations with gradient checkpointing to keep memory pressure manageable on 80B weights. (Adapter hyperparameters live in the training run artifacts; update this section if you change them.)
+3. **Merge** – After convergence the adapter was merged into the base weights (`peft.LoraModel.merge_and_unload`) to produce the standalone folder `sf-diogenes-v0.1/`. This keeps inference simple at the cost of a larger download.
+### Reproducing the Adapter
 ```python
+from datasets import load_dataset
+from peft import LoraConfig, get_peft_model
+from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Instruct", trust_remote_code=True)
+ds = load_dataset("json", data_files="finetune_dataset.jsonl", split="train")
+def format_example(example):
+    messages = [
+        {"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
+        {"role": "user", "content": example["prompt"]},
+        {"role": "assistant", "content": example["completion"]},
+    ]
+    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
+    tokenized = tokenizer(text, truncation=True, max_length=8192)
+    labels = tokenized["input_ids"][:]
+    # mask user tokens here...
+    tokenized["labels"] = labels
+    return tokenized
+tokenized_ds = ds.map(format_example, remove_columns=ds.column_names)
 model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen3-Next-80B-A3B-Instruct",
+    torch_dtype="bfloat16",
     device_map="auto",
+    trust_remote_code=True,
 )
+model = get_peft_model(model, LoraConfig(r=??, lora_alpha=??, lora_dropout=0.05))
+trainer = Trainer(
+    model=model,
+    train_dataset=tokenized_ds,
+    args=TrainingArguments(
+        output_dir="diogenes-lora",
+        num_train_epochs=1,
+        per_device_train_batch_size=1,
+        gradient_accumulation_steps=16,
+        learning_rate=1e-4,
+        bf16=True,
+        logging_steps=10,
+        save_steps=200,
+    ),
 )
+trainer.train()
+```
+Fill in the LoRA rank/alpha you used; they are omitted here because the original adapter config is not stored in the repo.
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "urm3l/sf-diogenes-v0.1"
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto", trust_remote_code=True)
+messages = [
+    {"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
+    {"role": "user", "content": "How do I keep Forecast Amount aligned with Amount Without Manager Adjustments?"},
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+### Inference Tips
+- Responses are optimized for structured, enumerated instructions. Use lower temperatures (0.1–0.3) to maintain factual tone.
+- Context windows above ~8k tokens work, but throughput falls sharply; prune long histories when possible.
+- Quantization (e.g., GPTQ or AWQ) is not provided here—quantize downstream if needed.
+## Evaluation
+Formal benchmarks (MT-Bench, AlpacaEval, etc.) have not yet been run since the focus was on Salesforce internal workflows. Please evaluate against:
+- Your organization’s Salesforce help-desk transcripts.
+- Synthetic scenario suites covering Forecasting, Commerce, Data Cloud, and CLM features.
+- Hallucination and safety probes for financial/PII content.
+Kindly share scores or qualitative findings back via issues so the card can be updated.
+## Limitations & Risks
+- **Domain narrowness** – The model was never exposed to open-ended general knowledge tasks; off-domain prompts can hallucinate confidently.
+- **Freshness** – Data reflects the state of Salesforce documentation as of Oct 2025. Features introduced later may be missing or inaccurate.
+- **Compliance** – The dataset embeds Salesforce product language; ensure redistribution complies with your documentation licenses.
+- **Resource requirements** – Running the full bf16 weights requires multi-GPU setups (≥160 GB VRAM) unless you quantize.
+## Responsible Use
+- Keep humans in the loop for contractual, legal, or financial decisions.
+- Filter prompts and outputs for sensitive data before logging or sharing.
+- When surfacing responses to end users, provide citations back to the authoritative Salesforce documentation or run retrieval-augmented generation to ground answers.
+## Maintenance & Support
+- **Owner**: `urm3l`
+- **Artifacts**: `sf-diogenes-v0.1/` (merged weights), `finetune_dataset.jsonl` (training data)
+- **Next steps**: add quantitative evals, document adapter hyperparameters, and consider releasing a quantized variant.
+_Last updated: 2025-11-02_