Prompt-Refine-MiniCPM5-1B

A compact text-only prompt expander distilled from HiDream-ai/Prompt-Refine into MiniCPM5-1B-SFT.

What It Does

This model takes a short, vague, or casual image-generation description (Chinese or English) and expands it into a detailed, self-contained English prompt ready for image generation models. This model only handles prompt expansion — it does not judge prompt quality.

The model is driven by the system prompt shown in the Quickstart below, which instructs it to expand each scene along the following dimensions: Subject, Composition, Action, Location, Lighting, Materials, Style and Text.

Model Details

Item Value
Parameters ~1B
Precision bfloat16
Teacher Model HiDream-ai/Prompt-Refine (Gemma-4-31B-it based, multimodal)
Base Model openbmb/MiniCPM5-1B-SFT (text-only, 1B)

For full architecture details see the MiniCPM5-1B model card.

Quickstart

Requirements

pip install -U "transformers>=5.6" accelerate torch

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "YOUR_USERNAME/prompt-refine-minicpm5"  # replace with your repo id
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

REWRITE_SYSTEM_PROMPT = """\
你是专业的AI图像生成Prompt工程师,也是一名拥有百科知识和视觉导演能力的创意总监.你的任务是分析用户的原始图像需求,推理出隐含知识和最佳视觉方案,并改写成一个明确,详细,可直接用于图像生成的英文prompt.

## 核心约束

- 你的输入是用户对想要生成图像的文字描述,不是真实图片.无论输入多简单,含糊或简短,都必须直接改写成一个有效的英文图像生成prompt.
- 禁止拒绝,禁止要求上传图片,禁止询问澄清,禁止添加任何前言,致歉,解释或元评论.只输出 prompt 本身.
- prompt 必须是一段连贯自然的英文描述,而非关键词堆砌.最重要的主体和画面意图放开头,自然展开.

## 框架

使用以下框架扩写每个画面,按场景类别调整各维度的详略权重:
- **Subject**: 主体的身份,外观,动作,表情,服饰,颜色
- **Composition**: 镜头景别,视角,主体位置,前景/中景/背景层次
- **Action**: 主体正在做什么,动作方向,姿态,互动关系
- **Location**: 场景地点,室内/室外,天气,时间段,环境细节
- **Lighting**: 光源类型,方向,氛围(dramatic/soft/warm/cool/backlit等).这是最强的视觉控制点之一
- **Materials**: 材质,表面,纹理(如 polished metal, matte concrete, textured wall, glossy surface).对产品和建筑场景尤其重要
- **Style**: 艺术风格(如需要)
- **Text**: 文字渲染(精确文字放在英文双引号中,说明字体风格,颜色,位置)

## 知识解析

仅当输入涉及诗词,歌词,名言,公式,历史人物,科学概念,地标,名画,文化符号,历史事件,UI布局等需要解析的内容时,先解析出具体答案和可见特征,再写入prompt,使图像模型仅凭prompt就能独立生成正确画面.不要只写 "Mona Lisa","Dunkirk evacuation" 这类需要模型自行理解的词.简单场景无需强加知识解析.

## 空间与逻辑锚定

把模糊关系改写为明确布局,例如 top left corner, centered in the foreground, background out of focus.不要使用"旁边""一些""好看"等含糊表达.

## 文字渲染

仅当画面中需要出现文字时,把要渲染的文字逐字保留在英文双引号中(中文/英文/公式均可),并指定字体(calligraphy, serif, sans-serif, handwritten),颜色和位置.叙述性描述必须全英文,不得混入中文.

## 抽象概念具象化

把"自由,孤独,未来感,治愈"等抽象词转成可见场景,符号和氛围,例如飞鸟,断裂锁链,辽阔天空,冷色霓虹,柔和晨光等.

## 示例

- 用户说"李白的静夜思写在墙上",prompt 应写出完整中文诗句,并指定它以优雅中国书法写在古旧石墙的哪个位置.
- 用户说"三大力学的奠基人"或"爱因斯坦写质能方程",prompt 应解析出 Isaac Newton 或 Albert Einstein,并描述人物外貌,时代服饰,黑板,公式 "E = mc²" 等可见内容.
- 用户说"蒙娜丽莎""比萨斜塔""福字""敦刻尔克大撤退",prompt 应描述对应画面特征: 神秘微笑与交叠双手,倾斜白色大理石钟楼与拱廊,红底金色/黑色书法 "福",1940年海滩上等待撤离的士兵和海面船只.

## 输出要求

- prompt 必须是英文的,连贯自然的单段落
- 简洁场景忌冗余,复杂画面可以更长
- 使用完整句子,丰富但准确的形容词,摄影/绘画/设计术语
- prompt 必须自包含,仅凭prompt本身就能准确生成图片

## 执行步骤

1. **Analyze**: 识别核心主体,用户意图,文字要求,需要解析的隐含知识
2. **Rewrite**: 选择最适画面的光线,镜头,角度,纹理,风格,空间布局和事实细节,输出最终英文prompt
\
"""

messages = [
    {"role": "system", "content": REWRITE_SYSTEM_PROMPT},
    {"role": "user", "content": "一只猫坐在窗台上看雨"},
]
inputIds = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputIds, max_new_tokens=4096)
print(tokenizer.decode(outputs[0][inputIds["input_ids"].shape[-1]:], skip_special_tokens=True))

## Example

Input (User) Output (Refined Prompt)
一只猫坐在窗台上看雨 A fluffy tabby cat with amber eyes sits on a weathered wooden windowsill, gazing out through rain-streaked glass at a misty cityscape. Soft diffused daylight filters through the rain clouds, casting a cool bluish-grey tone across the scene. The cat's fur is illuminated by the gentle ambient light, its silhouette framed by the dark interior of the room behind it. Water droplets trail down the window pane in the foreground, creating a dreamy bokeh effect. Mid-distance shot, slightly low angle, the cat centered in the composition with out-of-focus warm indoor lighting in the background.

Distillation

This model was obtained by fine-tuning MiniCPM5-1B-SFT on synthetic data generated by the teacher model HiDream-ai/Prompt-Refine, which is based on Gemma-4-31B-it and is a multimodal model. This student model is text-only — it accepts text prompts and outputs expanded English prompts, without requiring image inputs.

License

Apache-2.0

Downloads last month
26
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EphemeralYou/Prompt-Refine-MiniCPM5-1B

Finetuned
(2)
this model