HTMLGen — Qwen3.5-4B LoRA Adapter

A LoRA fine-tuned adapter on top of Qwen/Qwen3.5-4B for HTML generation.

This model generates a self-contained HTML5 source, faithfully preserving layout, typography, tables, formulas, and visual hierarchy.

Model Details

Item	Value
Base Model	Qwen/Qwen3.5-4B
Fine-tune Method	LoRA (PEFT)
LoRA Rank	32
LoRA Alpha	64
LoRA Dropout	0.05
Target Modules	all-linear (language_model)
Training Framework	ms-swift v4.3.0
Precision	bfloat16
Max Sequence Length	10240

Quick Start

Installation

pip install torch transformers peft accelerate pillow qwen_vl_utils
# Or use ms-swift (recommended):
pip install ms-swift[all]

Inference with ms-swift (Recommended)

from htmlgen_infer import HTMLGenModel

model = HTMLGenModel(
    adapter_path="Yesianrohn",  # or local path to this repo
    base_model="Qwen/Qwen3.5-4B",   # auto-detected from adapter_config.json
    merge_lora=True,
)

# Single image inference
html_output = model.predict("path/to/document_image.png")
print(html_output)

# Batch inference
results = model.predict_batch(["img1.png", "img2.png"])

Inference with Transformers + PEFT (Manual)

import torch
from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel
from PIL import Image

# Load base model
base_model_id = "Qwen/Qwen3.5-4B"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "Yesianrohn")
model = model.merge_and_unload()  # Optional: merge for faster inference

# Prepare input
image = Image.open("document.png").convert("RGB")
system_prompt = (
    "You are an expert document parser. Given an image of a document page, "
    "reconstruct its source as a single complete, self-contained HTML5 "
    "document. Faithfully preserve the original layout, typography, tables, "
    "formulas, and visual hierarchy using inline CSS where appropriate. "
    "Output only the HTML source, with no explanations, no markdown fences, "
    "and no extra prose."
)
user_prompt = (
    "Convert this document page into a complete HTML document. "
    "Preserve the layout, headings, tables, and formulas exactly as shown. "
    "Return only the HTML source."
)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": user_prompt},
    ]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=10240, temperature=0.0, do_sample=False)
output_text = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output_text)

Inference with ms-swift CLI

# Direct inference with swift
swift infer \
    --model Qwen/Qwen3.5-4B \
    --adapters Yesianrohn \
    --merge_lora true \
    --torch_dtype bfloat16 \
    --stream false

System Prompt

The model was trained with the following system prompt:

You are an expert document parser. Given an image of a document page, reconstruct its source as a single complete, self-contained HTML5 document. Faithfully preserve the original layout, typography, tables, formulas, and visual hierarchy using inline CSS where appropriate. Output only the HTML source, with no explanations, no markdown fences, and no extra prose.

Limitations

The model works best on clean, well-scanned document pages.
Very complex multi-column layouts or low-resolution images may produce imperfect HTML.
The maximum output length is 10240 tokens; very long documents may be truncated.

License

This adapter is released under the Apache 2.0 license. The base model (Qwen3.5-4B) has its own license — please refer to Qwen/Qwen3.5-4B for details.

Downloads last month: 33

Model tree for Yesianrohn/htmlgen-qwen3.5-4b-lora

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(257)

this model