Instructions to use sammiset/finops-resolver with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use sammiset/finops-resolver with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sammiset/finops-resolver", filename="models_gguf/qwen3-8b.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use sammiset/finops-resolver with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sammiset/finops-resolver:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sammiset/finops-resolver:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sammiset/finops-resolver:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sammiset/finops-resolver:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sammiset/finops-resolver:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf sammiset/finops-resolver:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sammiset/finops-resolver:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf sammiset/finops-resolver:Q4_K_M
Use Docker
docker model run hf.co/sammiset/finops-resolver:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use sammiset/finops-resolver with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sammiset/finops-resolver" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sammiset/finops-resolver", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sammiset/finops-resolver:Q4_K_M
- Ollama
How to use sammiset/finops-resolver with Ollama:
ollama run hf.co/sammiset/finops-resolver:Q4_K_M
- Unsloth Studio new
How to use sammiset/finops-resolver with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sammiset/finops-resolver to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sammiset/finops-resolver to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sammiset/finops-resolver to start chatting
- Pi new
How to use sammiset/finops-resolver with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sammiset/finops-resolver:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sammiset/finops-resolver:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sammiset/finops-resolver with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sammiset/finops-resolver:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sammiset/finops-resolver:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use sammiset/finops-resolver with Docker Model Runner:
docker model run hf.co/sammiset/finops-resolver:Q4_K_M
- Lemonade
How to use sammiset/finops-resolver with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sammiset/finops-resolver:Q4_K_M
Run and chat with the model
lemonade run user.finops-resolver-Q4_K_M
List all available models
lemonade list
finops-resolver
Fine-tuned Qwen3-8B model that recommends ordered resolution sequences for post-trade CNS settlement fails. Given a triage classification, inventory snapshot, and pending FTR list, the model produces a step-by-step resolution plan with mathematical coverage tracking and a plain-English narrative.
This is Stage 2 in a two-model pipeline:
| Stage | Model | Task | GGUF Size |
|---|---|---|---|
| 1 | finops-triage (Qwen3.5-9B) | Classify, score, and route fails | ~5.7 GB |
| 2 | finops-resolver (Qwen3-8B) | Recommend resolution sequence | ~5.0 GB |
All training data is synthetic. All resolution logic is grounded in a curated knowledge base of post-trade settlement rules.
Architecture
Base model: unsloth/Qwen3-8B (text-only, no VL overhead)
Fine-tuning method: QLoRA via Unsloth + trl SFTTrainer
| Parameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| LoRA dropout | 0 |
| Quantization | 4-bit NF4 (double quant) |
| Gradient checkpointing | Unsloth optimized |
| Max sequence length | 5,120 tokens |
| Effective batch size | 8 (2 per device x 4 accumulation) |
| Epochs | 3 |
| Learning rate | 2e-4 (cosine schedule, 90 warmup steps) |
| Precision | bf16 |
| Packing | Enabled (SFTTrainer) |
Training data: 8,000 train / 2,000 eval examples in ChatML format. Each example includes a structured <think>...</think> reasoning trace in the assistant turn, followed by the resolution JSON. The thinking trace walks through the full resolution logic: problem statement, inventory check, FTR ranking, chase walkthrough, fallback reasoning, Reg SHO timeline check, gridlock evaluation, and escalation decision.
Trainable parameters: 43.6M of 8.2B (0.53%)
Input Schema
The model expects a JSON object containing the triage output from Stage 1, the current inventory snapshot, pending FTRs, and any related fails.
{
"triage": {
"category": "CNS_FAIL",
"cns_direction": "FTD",
"lifecycle_state": "ESCALATED",
"priority_score": 88.2,
"priority_tier": "CRITICAL",
"score_components": {
"age": 18.0,
"value": 12.5,
"regulatory": 28.0,
"counterparty": 3.0
},
"reason": "CNS FTD 32000 shs, threshold security, high priority",
"action": "LOCATE_AND_DELIVER",
"escalation_level": "L3",
"deadline": "T+13",
"flags": ["THRESHOLD_SECURITY", "HIGH_VALUE"]
},
"cusip": "594918104",
"ftd_qty": 32000,
"inventory": {
"box_qty": 8000,
"stock_loan_available": 3000,
"recall_outstanding": 5000,
"pending_receives": 2000
},
"ftrs": [
{
"dtc": "DTC-0161",
"qty": 12000,
"age_days": 3,
"settlement_type": "RVP",
"settlement_date": "T+1",
"cp_fail_rate_pct": 2.1,
"partial_delivery_history": true
}
],
"related_fails": [
{
"category": "DVP_FAIL",
"dtc": "DTC-0352",
"qty": 5000,
"side": "Sell",
"age_days": 2
}
]
}
Input Field Reference
triage (from Stage 1 finops-triage model):
| Field | Type | Description |
|---|---|---|
category |
string | Fail classification (CNS_FAIL) |
cns_direction |
string | FTD (fail to deliver) or FTR (fail to receive) |
lifecycle_state |
string | NEW, OPEN, ESCALATED, AGED |
priority_score |
float | 0-100, weighted composite of age/value/regulatory/CP factors |
priority_tier |
string | LOW (0-25), MEDIUM (26-50), HIGH (51-75), CRITICAL (76-100) |
score_components |
object | Breakdown: age (30%), value (25%), regulatory (35%), counterparty (10%) |
reason |
string | Human-readable triage summary |
action |
string | Recommended action class from triage |
escalation_level |
string | L1, L2, L3 |
deadline |
string | Reg SHO close-out deadline (e.g., T+13 standard, T+6 threshold) |
flags |
array | Risk flags: THRESHOLD_SECURITY, HIGH_VALUE, REG_SHO_CLOSE_OUT, AGED_FAIL, LARGE_POSITION, ILLIQUID |
inventory:
| Field | Type | Description |
|---|---|---|
box_qty |
int | Unencumbered settled long positions (ex-SEG) |
stock_loan_available |
int | External borrow available |
recall_outstanding |
int | Stock lent that can be recalled |
pending_receives |
int | Expected inbound (T+1 receives, pending recalls) |
ftrs[]:
| Field | Type | Description |
|---|---|---|
dtc |
string | DTC participant number of counterparty |
qty |
int | FTR share quantity |
age_days |
int | Days since original settlement date |
settlement_type |
string | RVP (receive vs payment), DVP (deliver vs payment), FOP (free of payment) |
settlement_date |
string | Expected settlement date |
cp_fail_rate_pct |
float | Counterparty 15-day rolling fail rate |
partial_delivery_history |
bool | Whether this counterparty has delivered partials before |
related_fails[]:
| Field | Type | Description |
|---|---|---|
category |
string | DVP_FAIL, CNS_FAIL, DEPOT_FAIL |
dtc |
string | Counterparty DTC number |
qty |
int | Fail quantity |
side |
string | Buy or Sell |
age_days |
int | Fail age |
Output Schema
{
"resolution_steps": [
{
"step": 1,
"action": "CHASE_FTR",
"dtc": "DTC-0161",
"qty": 12000,
"settlement_type": "RVP",
"settlement_date": "T+1",
"rationale": "Age 3d 12000sh",
"coverage_after_step_pct": 37.5,
"remaining_short": 20000
}
],
"additional_ftrs_chased": 0,
"fallback_strategy": "INITIATE_RECALL",
"fallback_qty": 5000,
"secondary_fallback": "SOURCE_BORROW",
"secondary_fallback_qty": 3000,
"total_coverable": 28000,
"residual_short": 4000,
"residual_action": "ESCALATE",
"gridlock_detected": false,
"gridlock_parties": [],
"escalation_required": true,
"escalation_reason": "Residual 4000 shs uncoverable",
"narrative": "Chase DTC-0161 RVP T+1 for 12000 shs. INITIATE_RECALL 5000, SOURCE_BORROW 3000, escalate residual 4000 shs."
}
Output Field Reference
| Field | Type | Description |
|---|---|---|
resolution_steps[] |
array | Ordered sequence of resolution actions |
resolution_steps[].step |
int | 1-indexed step number |
resolution_steps[].action |
string | Action enum value (see below) |
resolution_steps[].dtc |
string | Target counterparty DTC number |
resolution_steps[].qty |
int | Share quantity for this step |
resolution_steps[].settlement_type |
string | RVP, DVP, or FOP |
resolution_steps[].settlement_date |
string | Expected settlement date |
resolution_steps[].rationale |
string | Why this step was prioritized |
resolution_steps[].coverage_after_step_pct |
float | Cumulative coverage percentage after this step |
resolution_steps[].remaining_short |
int | Shares still uncovered after this step |
additional_ftrs_chased |
int | FTR chases beyond the first 10 (truncated for token efficiency) |
fallback_strategy |
string | Primary fallback action after FTR chasing |
fallback_qty |
int | Shares covered by primary fallback |
secondary_fallback |
string | Secondary fallback action |
secondary_fallback_qty |
int | Shares covered by secondary fallback |
total_coverable |
int | Sum of all sources (FTRs + fallbacks) |
residual_short |
int | ftd_qty - total_coverable (floor 0) |
residual_action |
string | ESCALATE if residual > 0, else NONE |
gridlock_detected |
bool | Whether gridlock signals triggered |
gridlock_parties |
array | DTC numbers of parties in the gridlock chain |
escalation_required |
bool | True when residual_short > 0 after all steps |
escalation_reason |
string | Human-readable escalation reason (empty if not required) |
narrative |
string | Plain-English resolution summary |
Mathematical Invariants
These hold for every valid output:
coverage_after_step_pct=(cumulative_qty_covered / ftd_qty) * 100at each stepremaining_shortdecreases monotonically across stepstotal_coverable= sum of all step quantities + fallback_qty + secondary_fallback_qtyresidual_short= max(0, ftd_qty - total_coverable)escalation_required= (residual_short > 0)
Action Enum
| Action | Description |
|---|---|
CHASE_FTR |
Contact counterparty to push FTR delivery |
OFFSET_FTR |
Apply FTR directly against FTD at CNS |
APPLY_BOX |
Use free float inventory (unencumbered, ex-SEG) |
INITIATE_RECALL |
Recall stock lent via executing broker relationship |
SOURCE_BORROW |
Obtain stock loan externally (last resort) |
PARTIAL_DELIVER |
Proactively deliver partial quantity to CNS |
DEPOT_MOVEMENT |
Move shares from local market to DTC (ADR/cross-market) |
NET_GRIDLOCK |
Propose bilateral or tri-party net settlement |
BUY_IN_NOTICE |
Formal buy-in threat (B2B escalation) |
SPO_SETTLEMENT |
Cash settle free of payment (CA event) |
ESCALATE |
Residual requires human intervention |
Resolution Logic
All resolution logic is sourced from a curated knowledge base of post-trade settlement rules. The model does not invent rules.
CNS FTD Waterfall
The core resolution follows a strict priority waterfall โ exhaust cheaper/faster options before escalating:
1. Chase all FTRs (oldest first, largest as tiebreaker)
โ residual?
2. Apply free float box (unencumbered, ex-SEG only)
โ residual?
3. Initiate recall (cheaper than borrow, always attempt first)
โ residual?
4. Source stock loan (external borrow, last resort)
โ residual?
5. Escalate
FTR Prioritization
FTRs are chased simultaneously, but ranked for allocation:
| Priority | Rule |
|---|---|
| Primary sort | Age descending (oldest first) |
| Tiebreaker | Quantity descending (largest first) |
| Settlement type | Does NOT affect priority |
| Broker relationship | Does NOT affect priority |
Partial Delivery Policy
Always apply partial deliveries immediately. Never hold for full delivery. A partial today that keeps you inside the Reg SHO window is always preferred over a full delivery that arrives too late.
This applies to both FTDs (delivering partials to CNS) and FTRs (accepting partials from counterparties).
Cross-Date Netting
T+5 FTRs can fulfill T+3 FTDs. The model evaluates next-day obligation chains before recommending external sourcing, avoiding unnecessary stock loan cost.
Reg SHO Parallel Sourcing
Default behavior is sequential: recall first, borrow only if recall is insufficient.
Exception: If the recall notice period (3 business days) would breach the Reg SHO close-out deadline, the model initiates recall AND stock loan simultaneously. This is triggered when deadline_days <= recall_notice_period.
Reg SHO Timelines:
| Security Type | Grace Period | Close-out Deadline |
|---|---|---|
| Standard | T+4 through T+12 | Beginning of T+13 |
| Threshold | T+2 through T+5 | Beginning of T+6 |
CA Event Resolution
Corporate action event fails cannot be resolved unilaterally. The model recommends one of three paths:
| Option | Method | When |
|---|---|---|
| A | Deliver pre-event shares | Shares still valid post-event |
| B | Deliver post-event equivalent | New CUSIP issued, old invalid |
| C | SPO (cash settle free of payment) | Share delivery impractical post-CA |
All three require counterparty agreement.
Depot Movement
Triggered for ADR or cross-market securities where free float exists in a local market but not at DTC. The model factors movement timeline against the Reg SHO deadline before recommending.
B2B Escalation Path
For persistent street-side FTR failures:
1. Standard FTR chase (age + size priority)
2. Formal buy-in notice (threat creates legal/financial pressure)
3. Execute buy-in (if no delivery follows notice)
The buy-in threat alone typically resolves the fail.
Gridlock Detection
Gridlock is a circular dependency where multiple parties are each waiting on another to deliver the same CUSIP. Full visibility into counterparty positions is never available โ detection relies on patterns in your own FTR/FTD data.
Four-Signal Decision Tree
The model evaluates four signals sequentially:
| Signal | Test | If No |
|---|---|---|
| S1 | Same CUSIP appears in both your FTD and FTR positions? | Standard resolution (not mid-chain) |
| S2 | 3+ brokers involved in same CUSIP fails? | Bilateral issue (not gridlock) |
| S3 | FTR age increasing despite repeated chase attempts? | Continue chasing, monitor 1-2 days |
| S4 | Similar quantities across involved brokers? | Partial gridlock (one party is bottleneck) |
All four signals positive = full gridlock confirmed โ initiate multi-party net settlement.
S1-S3 positive, S4 negative = partial gridlock โ one party is likely the bottleneck, targeted escalation.
Gridlock Resolution
When gridlock is confirmed:
- Identify circular dependency from available data
- Simultaneous outreach to all parties in the chain
- Understand each party's obligation for the CUSIP
- Propose net settlement to break the dependency
- Accept partial net if full resolution not possible
Gridlock vs Non-Gridlock
| Factor | Gridlock | Not Gridlock |
|---|---|---|
| Parties | 3+ brokers on same CUSIP | 2 parties (bilateral) |
| Chase response | "Waiting on our source" | Specific delivery timeline |
| FTR aging | Steady increase despite chasing | Episodic delays |
| CUSIP concentration | Same CUSIP across multiple fails | Different CUSIPs failing |
| Resolution progress | None despite multiple attempts | Partial deliveries arriving |
Training Data
Generation
Training data is generated programmatically (scripts/gen_resolver.py + scripts/trace_generator.py). No LLM-generated examples. The generator:
- Samples FTR count from the complexity distribution
- Generates FTRs with realistic age/quantity/settlement distributions
- Generates independent inventory snapshots
- Walks the resolution waterfall deterministically (chase โ box โ recall โ borrow)
- Evaluates all four gridlock signals from the generated positions
- Selects fallback strategies following KB waterfall order
- Generates a deterministic
<think>reasoning trace narrating each decision - Validates mathematical invariants before writing
Complexity Distribution
| Tier | FTR Count | % of Dataset |
|---|---|---|
| Simple | 1-3 FTRs | 30% |
| Medium | 5-10 FTRs | 40% |
| Complex | 11-28 FTRs across 10+ brokers | 30% |
FTR Age Distribution
97% of FTRs are aged T+2 through T+5 (Gaussian centered at T+3), reflecting normal settlement. 3% tail extends to T+6 through T+10 for aged/escalated scenarios.
Scenario Coverage
The training data includes all of the following outcome types:
- Full coverage via FTRs alone (no fallback needed)
- Partial FTR coverage + box covers residual
- Partial FTR + box + recall
- Partial FTR + box + recall + borrow (full waterfall)
- Gridlock detected with multi-party outreach
- B2B escalation to buy-in notice
- CA event SPO settlement
- ADR depot movement required
- Cross-date netting (T+5 FTR fulfilling T+3 FTD)
- Parallel recall + loan (Reg SHO deadline conflict)
- Problem counterparty with partial delivery history
- Zero inventory (all external sourcing)
- Full inventory (no external sourcing needed)
Thinking Trace Methodology
Each training example includes a structured <think>...</think> block in the assistant turn. The trace is generated deterministically from the input/output pair (no LLM calls) and follows a fixed 10-section structure:
- Problem statement โ CUSIP, FTD quantity, priority tier, deadline, flags
- Inventory check โ Box, recall, borrow, pending receives, coverage ratio
- FTR ranking โ Sorted by age desc then qty desc, top 10 shown with ranking notes
- Chase walkthrough โ Step-by-step coverage tracking (first 5 shown, rest summarized)
- Fallback reasoning โ Primary and secondary strategies with source quantities
- Reg SHO check โ Deadline vs recall notice period, sequential vs parallel decision
- Gridlock evaluation โ All 4 signals evaluated explicitly (S1-S4)
- Escalation decision โ Required/not required with reason
- Conclusion โ Final coverage percentage, residual action
This teaches the model to reason through the waterfall before producing the resolution JSON.
Scope and Limitations
v1 Training Scope
This model covers:
- CNS FTD resolution waterfall (chase โ box โ recall โ borrow โ escalate)
- Gridlock detection via 4-signal decision tree (S1-S4)
- Reg SHO parallel sourcing (recall + borrow when deadline is tight)
- CA event SPO settlement paths
- Depot movement for ADR/cross-market securities
- Cross-date netting (T+5 FTR vs T+3 FTD)
- B2B buy-in notice escalation
Deferred to Future Training Iterations
- B2B buy-in execute path โ the model recommends buy-in notice (threat) but the full execute-buy-in workflow is not included in the current training run. The buy-in threat alone resolves the majority of cases; the execute path is planned for a future iteration.
- Synthetic FTR cap at 28 โ production environments can have unbounded FTR counts. The current training data caps at 28 FTRs across 10+ brokers. The architecture supports extension to higher counts in future runs.
- Multi-CUSIP resolution โ current scope is single-CUSIP per inference call. Cross-CUSIP optimization (e.g., portfolio-level netting) is planned.
- Real-time inventory refresh โ the model operates on a point-in-time inventory snapshot. Integration with streaming inventory updates is an inference-layer concern, not a model limitation.
The architecture supports extension across all of these dimensions. This is an actively developed model.
Usage
Ollama Setup
Create a Modelfile:
FROM ./finops-resolver-qwen3-8b-q4_k_m.gguf
PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER num_ctx 5120
SYSTEM "You are a post-trade settlement resolution assistant. Given a triage output, inventory snapshot, and pending FTRs for a CUSIP, recommend an ordered resolution sequence following the KB resolution logic. Chase FTRs first (oldest then largest), apply free box, recall before borrow. Apply partials immediately. Detect and flag gridlock. Output JSON only โ no explanation, no markdown, no preamble."
ollama create finops-resolver -f Modelfile
Two-Model Pipeline
# Stage 1 โ triage
TRIAGE_OUTPUT=$(ollama run finops-triage "Triage this fail record: {fail_record_json}")
# Stage 2 โ resolver
# Combine triage output with inventory + FTR data
ollama run finops-resolver "Resolve this fail:
{
\"triage\": $TRIAGE_OUTPUT,
\"cusip\": \"594918104\",
\"ftd_qty\": 32000,
\"inventory\": {
\"box_qty\": 8000,
\"stock_loan_available\": 3000,
\"recall_outstanding\": 5000,
\"pending_receives\": 2000
},
\"ftrs\": [...],
\"related_fails\": [...]
}"
Direct Inference
ollama run finops-resolver "Resolve this fail:
{\"triage\":{\"category\":\"CNS_FAIL\",\"cns_direction\":\"FTD\",\"lifecycle_state\":\"ESCALATED\",\"priority_score\":88.2,\"priority_tier\":\"CRITICAL\",\"score_components\":{\"age\":18.0,\"value\":12.5,\"regulatory\":28.0,\"counterparty\":3.0},\"reason\":\"CNS FTD 32000 shs, threshold security\",\"action\":\"LOCATE_AND_DELIVER\",\"escalation_level\":\"L3\",\"deadline\":\"T+13\",\"flags\":[\"THRESHOLD_SECURITY\"]},\"cusip\":\"594918104\",\"ftd_qty\":32000,\"inventory\":{\"box_qty\":8000,\"stock_loan_available\":3000,\"recall_outstanding\":5000,\"pending_receives\":2000},\"ftrs\":[{\"dtc\":\"DTC-0161\",\"qty\":12000,\"age_days\":3,\"settlement_type\":\"RVP\",\"settlement_date\":\"T+1\",\"cp_fail_rate_pct\":2.1,\"partial_delivery_history\":true}],\"related_fails\":[]}"
HuggingFace
Repository: sammiset/finops-resolver
GGUF download: finops-resolver-qwen3-8b-q4_k_m.gguf
Quantization: Q4_K_M (4-bit, k-quant mixed precision)
Base model: Qwen/Qwen3-8B
Training
pip install unsloth
python scripts/train.py
Produces the LoRA adapter in adapters/checkpoint-final/ and exports models_gguf/finops-resolver-qwen3-8b-q4_k_m.gguf automatically.
Training Results
Trained on RunPod A100 80GB. 3 epochs, 1,758 steps, ~5.5 hours.
| Metric | Value |
|---|---|
| Final train loss (avg) | 0.1625 |
| Final eval loss | 0.1424 |
| Eval loss (epoch 2.56) | 0.1425 |
| Train runtime | 20,020s (~5h 34m) |
| Train samples/sec | 0.702 |
| Train steps/sec | 0.088 |
| Eval samples/sec | 2.457 |
| Total steps | 1,758 |
| Final learning rate | 1.437e-08 |
| Final gradient norm | 0.037 |
Convergence notes:
- Train and eval loss converged closely (0.1625 avg vs 0.1424 eval) โ no overfitting
- Gradient norms stable at 0.03-0.04 throughout final epoch, indicating clean convergence
- Cosine LR schedule decayed smoothly from 2e-4 to ~1.4e-08
- Eval loss plateaued between epochs 2.56 and 3.0 (0.1425 โ 0.1424), suggesting 3 epochs was the right stopping point
Validation
python scripts/validate_resolver.py --input data/train.jsonl
Enforces: schema conformance, mathematical correctness of coverage percentages, monotonic remaining_short, valid action enum values, gridlock signal consistency, and escalation flag accuracy.
License and Credits
This project is licensed under the Apache License 2.0, consistent with the base model license.
Base model: Qwen/Qwen3-8B by the Qwen Team, Alibaba Group. Qwen3-8B is released under the Apache 2.0 license.
Fine-tuning framework: Unsloth for QLoRA training and GGUF export.
Training infrastructure: trl SFTTrainer on RunPod A100 80GB.
- Downloads last month
- 15
4-bit