ParaDesigner-SFT

English | 中文

Overview

ParaDesigner-SFT is a private ParadoxGPT Designer specialist model fine-tuned from a Qwen 4B base model. It is designed for research-paper workflows rather than general chat.

Intended Use

Use ParaDesigner-SFT for:

  • paper-level scientific writing and review-support workflows;
  • converting paper context into structured reasoning traces;
  • internal ParadoxGPT research-agent training and evaluation;
  • assisting human researchers with evidence, argumentation, and risk analysis.

Outputs should be checked against the original paper context. This model is not a substitute for human scientific judgment.

Training Data

The model was supervised-fine-tuned on private dataset bhxdianzhang/ParaSFT-designer.

Split Samples
Train 35,389
Dev 2,062
Test 2,296
Total 39,747

Task distribution:

Task type Samples
D1_claim_to_evidence 7,954
D2_experiment_argument_plan 7,951
D3_ablation_analysis_design 7,942
D4_sufficiency_critique 7,944
D5_interpretation_boundary 7,956

Quality filter: each target answer contains exactly one balanced <think>...</think> reasoning block followed by the final answer.

Training Summary

Base model: Qwen 4B local base model.

Field Value
Epochs 3.0
Learning rate 3e-6
Scheduler cosine
Distributed devices 2
Gradient accumulation 8
Total train batch size 16
Final train loss 0.8653310693
Final eval loss 0.9128507972

Qualitative Behavior

Local side-by-side checks show strong task alignment across D1-D5. Compared with the same 4B backbone before SFT, ParaDesigner-SFT is much more concise, more Chinese-final-answer focused, and better at turning claims into evidence plans, experiment argumentation, ablation/mechanism needs, sufficiency critiques, and interpretation boundaries.

ParaDesigner-SFT is best used with rich paper context matching the training shape, such as title, abstract, introduction, selected body sections, figure/table captions, claims, reviews, or task-specific paper context depending on the skill.

Example Prompt Shape

你是顶会论文分析专家。给定论文上下文,请按指定任务做中文分析。

论文标题: ...

摘要: ...

引言: ...

正文关键片段: ...

图表/实验/claim 信息: ...

任务: ...

Concrete Example

Input:

你是顶会论文实验论证规划专家。给定论文设定、方法组件和 claims,请用中文设计完整的实验论证方案。

论文标题: Adaptive Evidence Routing for Retrieval-Augmented Scientific QA

摘要: We propose a router that chooses between dense retrieval, citation-graph retrieval, and a hybrid path for scientific QA. The paper claims the router improves factuality and citation precision.

引言: Scientific questions require different evidence types. Definitions need canonical sources, comparison questions need related-work neighborhoods, and mechanistic questions need method sections.

方法组件: evidence router, citation-graph retriever, reranker trained on paper-review pairs.

Claims:
1. The router is the main source of factuality gains.
2. The citation graph improves citation precision.
3. The hybrid path is especially useful for mechanistic questions.

Expected ParaDesigner-SFT-style output:

这篇论文的实验论证不能只做一个 full system vs. base RAG。你真正要证明的是三个组件分别在什么地方起作用,所以实验应该分成三层。

第一层是贡献隔离:保留 citation graph 和 reranker,只移除 router,观察 factuality 是否明显下降。否则你不能说提升主要来自 router,只能说来自整个 evidence stack。

第二层是任务分解:把问题按 definition / comparison / mechanism 三类拆开。router 的价值应该体现在不同问题类型选择不同 evidence path,而不是只在平均分上变好。

第三层是机制解释:报告 router 的选择分布和错误案例。如果 mechanistic questions 经常被路由到 hybrid path,并且这些样本的引用更完整,才说明 router 学到了有意义的证据策略。

最关键的对照是:Base RAG、+citation graph、+reranker、+router,以及只替换 router 的 controlled ablation。这样 claim、组件和证据才能闭环。

Limitations

  • The model is optimized for a narrow ParadoxGPT specialist workflow, not general chat.
  • It may inherit teacher-model annotation errors from the SFT data.
  • It should not be used as an authority on paper correctness without source verification.

License and Use Restrictions

This repository is marked with license: other.

The model was trained on private ParadoxGPT SFT data derived from parsed academic papers, reviews, and teacher annotations. Rights to original papers and reviews remain with their respective authors, reviewers, venues, and publishers. This private model is provided for internal research and engineering use only. Redistribution or public release should be reviewed separately against source venue policies and applicable copyright rules.

Citation

@model{zhang2026paradesignersft,
  title        = {ParaDesigner-SFT: A ParadoxGPT Designer Specialist Model},
  author       = {Heng Zhang},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {https://huggingface.co/bhxdianzhang/ParaDesigner-SFT},
  note         = {Private ParadoxGPT supervised fine-tuned Designer model}
}

中文

概述

ParaDesigner-SFT 是 ParadoxGPT 的 Designer 专家模型,由 Qwen 4B 基座监督微调而来。它面向科研论文工作流,不是通用聊天模型。

适用场景

ParaDesigner-SFT 适合用于:

  • 论文级科研写作与 review-support 工作流;
  • 将论文上下文转成结构化 reasoning trace;
  • ParadoxGPT 内部 research-agent 训练与评测;
  • 辅助研究者做证据、论证和风险分析。

输出应结合原论文上下文复核,不能替代人类科研判断。

训练数据

模型使用私有数据集 bhxdianzhang/ParaSFT-designer 进行监督微调。

Split 样本数
Train 35,389
Dev 2,062
Test 2,296
Total 39,747

任务分布:

Task type Samples
D1_claim_to_evidence 7,954
D2_experiment_argument_plan 7,951
D3_ablation_analysis_design 7,942
D4_sufficiency_critique 7,944
D5_interpretation_boundary 7,956

质量过滤:每条目标答案都包含且只包含一个配平的 <think>...</think> 思考块,后接最终答案。

训练摘要

基座模型:本地 Qwen 4B base model。

字段 数值
Epochs 3.0
Learning rate 3e-6
Scheduler cosine
分布式设备数 2
Gradient accumulation 8
Total train batch size 16
Final train loss 0.8653310693
Final eval loss 0.9128507972

定性行为

本地 side-by-side 检查显示,ParaDesigner-SFT 在 D1-D5 上任务对齐稳定。相比同一 4B backbone 的 SFT 前模型,它更收敛、更聚焦中文最终答案,也更擅长把 claim 转成证据规划、实验论证、消融/机制分析需求、充分性批判和解释边界。

为了获得最佳效果,请使用与训练数据一致的丰富论文上下文,例如 title、abstract、introduction、正文关键片段、figure/table captions、claims、reviews 或具体任务所需的 paper context。

具体示例

输入:

你是顶会论文实验论证规划专家。给定论文设定、方法组件和 claims,请用中文设计完整的实验论证方案。

论文标题: Adaptive Evidence Routing for Retrieval-Augmented Scientific QA

摘要: 论文提出一个 router,在 dense retrieval、citation-graph retrieval 和 hybrid path 之间选择证据路径,用于 scientific QA。作者声称 router 提升 factuality 和 citation precision。

引言: 不同科学问题需要不同证据类型:定义问题需要 canonical source,对比问题需要 related-work neighborhood,机制问题需要 method section。

方法组件: evidence router, citation-graph retriever, reranker trained on paper-review pairs.

Claims:
1. router 是 factuality 提升的主要来源。
2. citation graph 提升 citation precision。
3. hybrid path 对 mechanistic questions 尤其有用。

ParaDesigner-SFT 期望输出片段:

这篇论文的实验论证不能只做 full system vs. base RAG。你真正要证明的是三个组件分别在什么地方起作用,所以实验应该分成三层。

第一层是贡献隔离:保留 citation graph 和 reranker,只移除 router,观察 factuality 是否明显下降。否则不能说提升主要来自 router,只能说来自整个 evidence stack。

第二层是任务分解:把问题按 definition / comparison / mechanism 三类拆开。router 的价值应该体现在不同问题类型选择不同 evidence path,而不是只在平均分上变好。

第三层是机制解释:报告 router 的选择分布和错误案例。如果 mechanistic questions 经常被路由到 hybrid path,并且这些样本的引用更完整,才说明 router 学到了有意义的证据策略。

局限

  • 模型针对 ParadoxGPT 窄域专家工作流优化,不是通用聊天模型。
  • 模型可能继承 SFT 数据中的 teacher-model 标注误差。
  • 不应在未检查原论文上下文的情况下,把输出当成科研事实。

引用

@model{zhang2026paradesignersft,
  title        = {ParaDesigner-SFT: A ParadoxGPT Designer Specialist Model},
  author       = {Heng Zhang},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {https://huggingface.co/bhxdianzhang/ParaDesigner-SFT},
  note         = {Private ParadoxGPT supervised fine-tuned Designer model}
}
Downloads last month
18
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bhxdianzhang/ParaDesigner-SFT

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(326)
this model

Collection including bhxdianzhang/ParaDesigner-SFT

Evaluation results