YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Vietnamese Calligraphy Ideogram4 LoRA (V10 Compound Gold)
This repository contains the low-rank adaptation (LoRA) checkpoint for generating high-fidelity Vietnamese calligraphy characters and phrases.
It is fine-tuned on top of Ideogram4 (9.3B) using multi-word compound datasets, with the goal of accurate rendering for Vietnamese diacritics and brush styles.
Visual Results & Comparisons
Competitor Baseline Comparison
Below is a comparison of rendering the same phrase across different systems, showing where the fine-tuned model improves diacritic accuracy and calligraphic aesthetic preservation over Qwen Image, ERNIE Image, and commercial black-box generators:
Compound Eval28 Progress (Before vs. After SFT)
Evolution of diacritic binding during compound training epochs:
Preservation of Base Model Scene Capability
Demonstrating that the LoRA adapter retains the original model's prompt-following and high-quality background rendering capabilities when rendering calligraphy in complex scenes:
Model Details
Target Modules (6 modules):
attention.qkvattention.ofeed_forward.w1feed_forward.w2feed_forward.w3adaln_modulation
Files in this Repository
- Performance: Achieved 97.6% accuracy (only 4 word-level errors out of 168 words on the held-out Eval28 panel).
step-soup_infer.safetensors: The inference-ready checkpoint. This file has its weights pre-scaled for rsLoRA inference wrapper (alpha/sqrt(rank)).step-soup.safetensors: The official training checkpoint (withalpha/rankstandard scale), useful for further fine-tuning or checkpoint averaging (souping).
Inference Usage
To run inference, you should load the base FP8 Ideogram4 model and inject these LoRA weights using DiffSynth-Studio.
Python Example Code
import torch
import json
from diffsynth.core import ModelConfig
from diffsynth.pipelines.ideogram4 import Ideogram4Pipeline
# 1. Define model directory paths (make sure to download FP8 Ideogram4 components)
model_dir = "models/ideogram-ai/ideogram-4-fp8"
lora_ckpt = "step-soup_infer.safetensors" # Downloaded from this repository
# 2. Initialize Pipeline
pipe = Ideogram4Pipeline.from_pretrained(
model_dir,
torch_dtype=torch.bfloat16,
device="cuda"
)
# 3. Inject LoRA weights into DiT
from hybrid_peft_ideogram4 import inject_lora_into_dit, load_lora_checkpoint
inject_lora_into_dit(
pipe.dit,
targets=["attention.qkv", "attention.o", "feed_forward.w1", "feed_forward.w2", "feed_forward.w3", "adaln_modulation"],
rank=64,
alpha=64.0
)
load_lora_checkpoint(pipe.dit, lora_ckpt)
# 4. Build prompt utilizing layout-aware no-bbox description
prompt_json = json.dumps({
"high_level_description": 'Vietnamese calligraphy artwork of the phrase "An Khang Thịnh Vượng" in traditional brush style. The text is written in Vietnamese alphabet.',
"style_description": {
"art_style": "calligraphy",
"ink_color": "black",
"brush_style": "Traditional Vietnamese brush calligraphy, bold and elegant strokes",
"writing_surface": "Plain white rice-paper background, no texture, no border"
},
"compositional_deconstruction": {
"background": "Plain white rice-paper background, no texture, no border.",
"elements": [{
"type": "text",
"text": "An Khang\nThịnh Vượng",
"desc": "Traditional Vietnamese calligraphy characters arranged in a tidy grid of several stacked rows, multiple words per row, evenly spaced and centered, written in bold black ink brush strokes. Font: Thanh Cong Unicode.",
}],
},
}, ensure_ascii=False)
# 5. Run Generation
image = pipe(
prompt=prompt_json,
cfg_scale=7.0,
num_inference_steps=48,
seed=7000
)
image.save("vietnamese_calligraphy_output.png")
Citation
If you use this model or code in your research, please cite our Master's thesis:
@mastersthesis{dopt2026vietnamesecalligraphy,
author = {Đỗ Tuấn Phong},
title = {Fine-tuning Ideogram4 for Vietnamese Calligraphy Text Rendering},
school = {FPT University},
address = {Hanoi, Vietnam},
year = {2026},
type = {{M.Sc.}},
month = jun,
note = {MSE-AI program}
}