HTMLGen โ€” Qwen3.5-4B LoRA Adapter

A LoRA fine-tuned adapter on top of Qwen/Qwen3.5-4B for HTML generation.

This model generates a self-contained HTML5 source, faithfully preserving layout, typography, tables, formulas, and visual hierarchy.

Model Details

Item Value
Base Model Qwen/Qwen3.5-4B
Fine-tune Method LoRA (PEFT)
LoRA Rank 32
LoRA Alpha 64
LoRA Dropout 0.05
Target Modules all-linear (language_model)
Training Framework ms-swift v4.3.0
Precision bfloat16
Max Sequence Length 10240

Quick Start

Installation

pip install torch transformers peft accelerate pillow qwen_vl_utils
# Or use ms-swift (recommended):
pip install ms-swift[all]

Inference with ms-swift (Recommended)

from htmlgen_infer import HTMLGenModel

model = HTMLGenModel(
    adapter_path="Yesianrohn",  # or local path to this repo
    base_model="Qwen/Qwen3.5-4B",   # auto-detected from adapter_config.json
    merge_lora=True,
)

# Single image inference
html_output = model.predict("path/to/document_image.png")
print(html_output)

# Batch inference
results = model.predict_batch(["img1.png", "img2.png"])

Inference with Transformers + PEFT (Manual)

import torch
from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel
from PIL import Image

# Load base model
base_model_id = "Qwen/Qwen3.5-4B"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "Yesianrohn")
model = model.merge_and_unload()  # Optional: merge for faster inference

# Prepare input
image = Image.open("document.png").convert("RGB")
system_prompt = (
    "You are an expert document parser. Given an image of a document page, "
    "reconstruct its source as a single complete, self-contained HTML5 "
    "document. Faithfully preserve the original layout, typography, tables, "
    "formulas, and visual hierarchy using inline CSS where appropriate. "
    "Output only the HTML source, with no explanations, no markdown fences, "
    "and no extra prose."
)
user_prompt = (
    "Convert this document page into a complete HTML document. "
    "Preserve the layout, headings, tables, and formulas exactly as shown. "
    "Return only the HTML source."
)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": user_prompt},
    ]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=10240, temperature=0.0, do_sample=False)
output_text = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output_text)

Inference with ms-swift CLI

# Direct inference with swift
swift infer \
    --model Qwen/Qwen3.5-4B \
    --adapters Yesianrohn \
    --merge_lora true \
    --torch_dtype bfloat16 \
    --stream false

System Prompt

The model was trained with the following system prompt:

You are an expert document parser. Given an image of a document page, reconstruct its source as a single complete, self-contained HTML5 document. Faithfully preserve the original layout, typography, tables, formulas, and visual hierarchy using inline CSS where appropriate. Output only the HTML source, with no explanations, no markdown fences, and no extra prose.

Limitations

  • The model works best on clean, well-scanned document pages.
  • Very complex multi-column layouts or low-resolution images may produce imperfect HTML.
  • The maximum output length is 10240 tokens; very long documents may be truncated.

License

This adapter is released under the Apache 2.0 license. The base model (Qwen3.5-4B) has its own license โ€” please refer to Qwen/Qwen3.5-4B for details.

Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Yesianrohn/htmlgen-qwen3.5-4b-lora

Finetuned
Qwen/Qwen3.5-4B
Adapter
(257)
this model