tilman-d commited on
Commit
e8c0880
·
verified ·
1 Parent(s): b10e37f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -38
README.md CHANGED
@@ -1,73 +1,185 @@
1
  ---
2
  license: apache-2.0
3
  base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
 
 
 
 
4
  tags:
5
- - merge
6
  - lora
 
 
7
  - qwen3
8
- - instruct
9
- library_name: transformers
10
  ---
11
 
12
  # sf-diogenes-v0.1
13
 
14
- This model is a merge of the [Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct) base model with the [urm3l/diogenes-v.01-lora-adapter](https://huggingface.co/urm3l/diogenes-v.01-lora-adapter) LoRA adapter.
 
 
 
 
15
 
16
  ## Model Details
17
 
18
- - **Base Model**: Qwen/Qwen3-Next-80B-A3B-Instruct
19
- - **LoRA Adapter**: urm3l/diogenes-v.01-lora-adapter
20
- - **Merge Date**: 2025-11-01
21
- - **Model Size**: ~80B parameters
22
- - **Precision**: BF16
 
 
 
 
 
23
 
24
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ```python
27
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  model = AutoModelForCausalLM.from_pretrained(
30
- "urm3l/sf-diogenes-v0.1",
31
- torch_dtype="auto",
32
  device_map="auto",
33
- trust_remote_code=True
34
  )
35
-
36
- tokenizer = AutoTokenizer.from_pretrained(
37
- "urm3l/sf-diogenes-v0.1",
38
- trust_remote_code=True
 
 
 
 
 
 
 
 
 
 
 
39
  )
 
 
40
 
41
- messages = [
42
- {"role": "system", "content": "You are a helpful assistant."},
43
- {"role": "user", "content": "Hello!"}
44
- ]
45
 
46
- text = tokenizer.apply_chat_template(
47
- messages,
48
- tokenize=False,
49
- add_generation_prompt=True
50
- )
51
 
 
 
 
 
 
 
 
 
 
52
  inputs = tokenizer(text, return_tensors="pt").to(model.device)
53
- outputs = model.generate(**inputs, max_new_tokens=512)
54
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
55
  ```
56
 
57
- ## Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
- This model was fine-tuned using LoRA (Low-Rank Adaptation). For training details, see the [adapter repository](https://huggingface.co/urm3l/diogenes-v.01-lora-adapter).
60
 
61
- ## Limitations
 
 
 
62
 
63
- - This is a large language model and may produce incorrect or biased outputs
64
- - Should not be used for high-stakes decision making without human oversight
65
- - May require significant computational resources for inference
66
 
67
- ## License
 
 
68
 
69
- This model inherits the license from the base model: Apache 2.0
70
 
71
- ## Citation
 
 
72
 
73
- If you use this model, please cite the original Qwen3 paper and acknowledge the LoRA fine-tuning.
 
1
  ---
2
  license: apache-2.0
3
  base_model: Qwen/Qwen3-Next-80B-A3B-Instruct
4
+ language:
5
+ - en
6
+ pipeline_tag: text-generation
7
+ library_name: transformers
8
  tags:
 
9
  - lora
10
+ - salesforce
11
+ - instruction-tuning
12
  - qwen3
 
 
13
  ---
14
 
15
  # sf-diogenes-v0.1
16
 
17
+ ## TL;DR
18
+
19
+ - 80B-parameter Qwen3 Next Instruct model merged with a domain LoRA adapter for Salesforce analytics, Data Cloud, Commerce, and Order Management workflows.
20
+ - Trained on 102,827 curated instruction/response pairs stored in `finetune_dataset.jsonl` (≈151 MiB) with a fixed system primer _“You are a helpful Salesforce analytics assistant.”_
21
+ - Distributed as merged full weights (bf16) so it can be loaded like any other Transformers causal LM without applying the adapter at runtime.
22
 
23
  ## Model Details
24
 
25
+ | Field | Value |
26
+ | --- | --- |
27
+ | **Base** | [`Qwen/Qwen3-Next-80B-A3B-Instruct`](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct) |
28
+ | **Architecture** | `Qwen3NextForCausalLM` (Mixture-of-Experts with alternating linear/full attention) |
29
+ | **Parameters** | ~80B (same as base; LoRA deltas merged) |
30
+ | **Context Length** | 262,144 tokens (see `config.json`) |
31
+ | **Precision** | bfloat16 |
32
+ | **Tokenizer** | Qwen 151,936 vocab with chat template in `sf-diogenes-v0.1/chat_template.jinja` |
33
+ | **Finetuning** | Supervised fine-tuning with LoRA → merged back into the base weights |
34
+ | **Libraries** | `transformers==4.57.1`, `peft`, `accelerate` |
35
 
36
+ ### Prompt Template
37
+
38
+ Generation follows the Qwen3 template with `<|im_start|>role` markers. Each training record began with the system message above, a single user question, and an assistant reply. When crafting new prompts, either rely on `tokenizer.apply_chat_template` or format manually:
39
+
40
+ ```
41
+ <|im_start|>system
42
+ You are a helpful Salesforce analytics assistant.<|im_end|>
43
+ <|im_start|>user
44
+ <user question><|im_end|>
45
+ <|im_start|>assistant
46
+ ```
47
+
48
+ ## Dataset: `finetune_dataset.jsonl`
49
+
50
+ | Metric | Value |
51
+ | --- | --- |
52
+ | Records | 102,827 JSONL rows |
53
+ | Size | 158,187,480 bytes (≈151 MiB) |
54
+ | Fields | `text`, `prompt`, `completion` |
55
+ | Domain | Salesforce Forecasting, Data Cloud, Commerce, Order Management, CLM, Payments, Einstein recommendations |
56
+ | Structure | System primer → user instruction → multi-step assistant answer |
57
+
58
+ Sample row:
59
+
60
+ ```json
61
+ {
62
+ "prompt": "How can I view the Payments Administrator permissions and choose a setup approach for Salesforce Payments?",
63
+ "completion": "1) From Setup, Quick Find, enter Permission Sets, and click the Payments Administrator permission set to view the System Permissions. 2) Review the System Permissions in that permission set. 3) Use the Commerce Setup Assistant (recommended) for an easy, guided setup, or use the Setup menu for manual configuration. 4) To use Pay Now on its own, a standalone Payments license is required."
64
+ }
65
+ ```
66
+
67
+ Observations:
68
+
69
+ - Every `text` field already contains the fully formatted conversation, so the `prompt`/`completion` split was used to mask losses on user content.
70
+ - Content is English-only and keeps product names and feature terminology verbatim from Salesforce documentation and release notes. No customer-specific or PII entries were detected during sampling, but please review before sharing externally.
71
+ - No augmentation, paraphrasing, or rejection sampling logs are stored in the repo; the dataset is used as-is in chronological order.
72
+
73
+ ## Training Procedure
74
+
75
+ 1. **Pre-processing** – Each JSONL row is converted to tokens through the bundled Qwen3 chat template. Only the assistant completion is used as the supervised target; prompt tokens are masked.
76
+ 2. **LoRA fine-tuning** – LoRA adapters were attached to the base attention and feed-forward projections (standard PEFT setup). Training used bf16 activations with gradient checkpointing to keep memory pressure manageable on 80B weights. (Adapter hyperparameters live in the training run artifacts; update this section if you change them.)
77
+ 3. **Merge** – After convergence the adapter was merged into the base weights (`peft.LoraModel.merge_and_unload`) to produce the standalone folder `sf-diogenes-v0.1/`. This keeps inference simple at the cost of a larger download.
78
+
79
+ ### Reproducing the Adapter
80
 
81
  ```python
82
+ from datasets import load_dataset
83
+ from peft import LoraConfig, get_peft_model
84
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
85
+
86
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Instruct", trust_remote_code=True)
87
+ ds = load_dataset("json", data_files="finetune_dataset.jsonl", split="train")
88
+
89
+ def format_example(example):
90
+ messages = [
91
+ {"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
92
+ {"role": "user", "content": example["prompt"]},
93
+ {"role": "assistant", "content": example["completion"]},
94
+ ]
95
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
96
+ tokenized = tokenizer(text, truncation=True, max_length=8192)
97
+ labels = tokenized["input_ids"][:]
98
+ # mask user tokens here...
99
+ tokenized["labels"] = labels
100
+ return tokenized
101
+
102
+ tokenized_ds = ds.map(format_example, remove_columns=ds.column_names)
103
 
104
  model = AutoModelForCausalLM.from_pretrained(
105
+ "Qwen/Qwen3-Next-80B-A3B-Instruct",
106
+ torch_dtype="bfloat16",
107
  device_map="auto",
108
+ trust_remote_code=True,
109
  )
110
+ model = get_peft_model(model, LoraConfig(r=??, lora_alpha=??, lora_dropout=0.05))
111
+
112
+ trainer = Trainer(
113
+ model=model,
114
+ train_dataset=tokenized_ds,
115
+ args=TrainingArguments(
116
+ output_dir="diogenes-lora",
117
+ num_train_epochs=1,
118
+ per_device_train_batch_size=1,
119
+ gradient_accumulation_steps=16,
120
+ learning_rate=1e-4,
121
+ bf16=True,
122
+ logging_steps=10,
123
+ save_steps=200,
124
+ ),
125
  )
126
+ trainer.train()
127
+ ```
128
 
129
+ Fill in the LoRA rank/alpha you used; they are omitted here because the original adapter config is not stored in the repo.
 
 
 
130
 
131
+ ## Usage
132
+
133
+ ```python
134
+ from transformers import AutoTokenizer, AutoModelForCausalLM
 
135
 
136
+ model_id = "urm3l/sf-diogenes-v0.1"
137
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
138
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto", trust_remote_code=True)
139
+
140
+ messages = [
141
+ {"role": "system", "content": "You are a helpful Salesforce analytics assistant."},
142
+ {"role": "user", "content": "How do I keep Forecast Amount aligned with Amount Without Manager Adjustments?"},
143
+ ]
144
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
145
  inputs = tokenizer(text, return_tensors="pt").to(model.device)
146
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)
147
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
148
  ```
149
 
150
+ ### Inference Tips
151
+
152
+ - Responses are optimized for structured, enumerated instructions. Use lower temperatures (0.1–0.3) to maintain factual tone.
153
+ - Context windows above ~8k tokens work, but throughput falls sharply; prune long histories when possible.
154
+ - Quantization (e.g., GPTQ or AWQ) is not provided here—quantize downstream if needed.
155
+
156
+ ## Evaluation
157
+
158
+ Formal benchmarks (MT-Bench, AlpacaEval, etc.) have not yet been run since the focus was on Salesforce internal workflows. Please evaluate against:
159
+
160
+ - Your organization’s Salesforce help-desk transcripts.
161
+ - Synthetic scenario suites covering Forecasting, Commerce, Data Cloud, and CLM features.
162
+ - Hallucination and safety probes for financial/PII content.
163
+
164
+ Kindly share scores or qualitative findings back via issues so the card can be updated.
165
 
166
+ ## Limitations & Risks
167
 
168
+ - **Domain narrowness** – The model was never exposed to open-ended general knowledge tasks; off-domain prompts can hallucinate confidently.
169
+ - **Freshness** – Data reflects the state of Salesforce documentation as of Oct 2025. Features introduced later may be missing or inaccurate.
170
+ - **Compliance** – The dataset embeds Salesforce product language; ensure redistribution complies with your documentation licenses.
171
+ - **Resource requirements** – Running the full bf16 weights requires multi-GPU setups (≥160 GB VRAM) unless you quantize.
172
 
173
+ ## Responsible Use
 
 
174
 
175
+ - Keep humans in the loop for contractual, legal, or financial decisions.
176
+ - Filter prompts and outputs for sensitive data before logging or sharing.
177
+ - When surfacing responses to end users, provide citations back to the authoritative Salesforce documentation or run retrieval-augmented generation to ground answers.
178
 
179
+ ## Maintenance & Support
180
 
181
+ - **Owner**: `urm3l`
182
+ - **Artifacts**: `sf-diogenes-v0.1/` (merged weights), `finetune_dataset.jsonl` (training data)
183
+ - **Next steps**: add quantitative evals, document adapter hyperparameters, and consider releasing a quantized variant.
184
 
185
+ _Last updated: 2025-11-02_