Instructions to use MapleRhythm/asa-arknightstoryagent-4b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use MapleRhythm/asa-arknightstoryagent-4b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("model/merged/soda_targeted_human_20260606_v3_200_current_chain_cutoff5632_kto_merged_text") model = PeftModel.from_pretrained(base_model, "MapleRhythm/asa-arknightstoryagent-4b-lora") - Notebooks
- Google Colab
- Kaggle
ASA-ArknightStoryAgent 4B LoRA
Release version: 20260607-cutoff6656
This repository contains the current LoRA adapter used by ASA-ArknightStoryAgent's GPU/vLLM inference runtime for Chinese Arknights story QA. It is not a standalone chat model. It is intended to be used with the ASA RAG pipeline, which retrieves story evidence and asks the model to produce grounded action JSON before answering.
Stable runtime path:
model/lora/asa-arknightstoryagent-4b-lora/
Local training artifact:
model/lora/soda_targeted_human_20260606_v3_200_current_chain_from_mergedbase_qwen35_4b_lr8e7_beta001_epoch2_rank8_cutoff6656_filtered
Runtime
Recommended runtime repository:
https://github.com/MapleRhythm/ASA-ArknightStoryAgent
Download into the release tree:
huggingface-cli download MapleRhythm/asa-arknightstoryagent-4b-lora \
--local-dir model/lora/asa-arknightstoryagent-4b-lora
Run with the GPU release config:
bash scripts/run_gpu_reranker_qwen35_4b.sh --answer-only "岁兽是什么?"
Current runtime defaults:
- Base model path:
model/qwen3.5-4b - LoRA path:
model/lora/asa-arknightstoryagent-4b-lora - Context size:
10000 - Max generation tokens:
1536 answer_grounding_mode:quoteconclusion_prompt_mode:minimal- Web context: disabled by default
The adapter was trained from an internal merged Qwen3.5-4B checkpoint. The release runtime is the supported way to load it; direct AutoPeftModel usage may require adapting local base-model paths.
Training
Method: LoRA preference tuning with LLaMA-Factory / PEFT.
Key hyperparameters:
- LoRA rank:
8 - LoRA alpha:
16 - LoRA dropout:
0.05 - Learning rate:
8e-7 - Epochs:
2 - Scheduler: cosine
- Effective batch size:
4
Training objective focused on ASA runtime failures: grounded action JSON stability, avoiding over-abstain when evidence is sufficient, and improving answer behavior under the current RAG chain.
Evaluation Snapshot
Training-time eval metrics:
- eval loss:
0.5542 - rewards/chosen:
0.0427 - rewards/rejected:
0.0250 - rewards/margins:
0.0177 - KL:
195.9924
Pipeline evaluation before the runtime truncation-recovery patch:
eval50: 50 questions, 3 JSON errors, 13 abstain-like answers, 34 direct answers.hard10: 10 questions, 1 JSON error, 3 abstain-like answers, 6 direct answers.
Runtime truncation-recovery regression after the patch:
- 4 truncation-prone questions, 0 JSON errors.
The full eval50 + hard10 suite should be rerun after each runtime or model change before treating this as a production-quality release.
Limitations
- The model should not be used without retrieval evidence. It can hallucinate or over-infer from weak evidence.
- Low-confidence retrieval cases, especially
fusion_score=0or missing dense/sparse/MiniRAG/evidence-chain scores, should be handled conservatively by the application. - Subjective questions such as "what bad things did X do" need framing by viewpoint; the model may otherwise mix factual actions with moral judgments.
- This adapter does not include the base model, game text, retrieval indexes, or reranker weights.
License And Data Notes
This adapter is released under other because downstream use depends on the base model license and on the rights around the source story corpus used to build the retrieval system. The adapter repository does not include raw game story text or prebuilt story indexes.
- Downloads last month
- 28