ASA-ArknightStoryAgent 4B LoRA

Release version: 20260607-cutoff6656

This repository contains the current LoRA adapter used by ASA-ArknightStoryAgent's GPU/vLLM inference runtime for Chinese Arknights story QA. It is not a standalone chat model. It is intended to be used with the ASA RAG pipeline, which retrieves story evidence and asks the model to produce grounded action JSON before answering.

Stable runtime path:

model/lora/asa-arknightstoryagent-4b-lora/

Local training artifact:

model/lora/soda_targeted_human_20260606_v3_200_current_chain_from_mergedbase_qwen35_4b_lr8e7_beta001_epoch2_rank8_cutoff6656_filtered

Runtime

Recommended runtime repository:

https://github.com/MapleRhythm/ASA-ArknightStoryAgent

Download into the release tree:

huggingface-cli download MapleRhythm/asa-arknightstoryagent-4b-lora \
  --local-dir model/lora/asa-arknightstoryagent-4b-lora

Run with the GPU release config:

bash scripts/run_gpu_reranker_qwen35_4b.sh --answer-only "岁兽是什么？"

Current runtime defaults:

Base model path: model/qwen3.5-4b
LoRA path: model/lora/asa-arknightstoryagent-4b-lora
Context size: 10000
Max generation tokens: 1536
answer_grounding_mode: quote
conclusion_prompt_mode: minimal
Web context: disabled by default

The adapter was trained from an internal merged Qwen3.5-4B checkpoint. The release runtime is the supported way to load it; direct AutoPeftModel usage may require adapting local base-model paths.

Training

Method: LoRA preference tuning with LLaMA-Factory / PEFT.

Key hyperparameters:

LoRA rank: 8
LoRA alpha: 16
LoRA dropout: 0.05
Learning rate: 8e-7
Epochs: 2
Scheduler: cosine
Effective batch size: 4

Training objective focused on ASA runtime failures: grounded action JSON stability, avoiding over-abstain when evidence is sufficient, and improving answer behavior under the current RAG chain.

Evaluation Snapshot

Training-time eval metrics:

eval loss: 0.5542
rewards/chosen: 0.0427
rewards/rejected: 0.0250
rewards/margins: 0.0177
KL: 195.9924

Pipeline evaluation before the runtime truncation-recovery patch:

eval50: 50 questions, 3 JSON errors, 13 abstain-like answers, 34 direct answers.
hard10: 10 questions, 1 JSON error, 3 abstain-like answers, 6 direct answers.

Runtime truncation-recovery regression after the patch:

4 truncation-prone questions, 0 JSON errors.

The full eval50 + hard10 suite should be rerun after each runtime or model change before treating this as a production-quality release.

Limitations

The model should not be used without retrieval evidence. It can hallucinate or over-infer from weak evidence.
Low-confidence retrieval cases, especially fusion_score=0 or missing dense/sparse/MiniRAG/evidence-chain scores, should be handled conservatively by the application.
Subjective questions such as "what bad things did X do" need framing by viewpoint; the model may otherwise mix factual actions with moral judgments.
This adapter does not include the base model, game text, retrieval indexes, or reranker weights.

License And Data Notes

This adapter is released under other because downstream use depends on the base model license and on the rights around the source story corpus used to build the retrieval system. The adapter repository does not include raw game story text or prebuilt story indexes.

Downloads last month: 28

Model tree for MapleRhythm/asa-arknightstoryagent-4b-lora

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(258)

this model