Instructions to use Yesianrohn/htmlgen-qwen3.5-4b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Yesianrohn/htmlgen-qwen3.5-4b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B") model = PeftModel.from_pretrained(base_model, "Yesianrohn/htmlgen-qwen3.5-4b-lora") - Notebooks
- Google Colab
- Kaggle
HTMLGen โ Qwen3.5-4B LoRA Adapter
A LoRA fine-tuned adapter on top of Qwen/Qwen3.5-4B for HTML generation.
This model generates a self-contained HTML5 source, faithfully preserving layout, typography, tables, formulas, and visual hierarchy.
Model Details
| Item | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-4B |
| Fine-tune Method | LoRA (PEFT) |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| LoRA Dropout | 0.05 |
| Target Modules | all-linear (language_model) |
| Training Framework | ms-swift v4.3.0 |
| Precision | bfloat16 |
| Max Sequence Length | 10240 |
Quick Start
Installation
pip install torch transformers peft accelerate pillow qwen_vl_utils
# Or use ms-swift (recommended):
pip install ms-swift[all]
Inference with ms-swift (Recommended)
from htmlgen_infer import HTMLGenModel
model = HTMLGenModel(
adapter_path="Yesianrohn", # or local path to this repo
base_model="Qwen/Qwen3.5-4B", # auto-detected from adapter_config.json
merge_lora=True,
)
# Single image inference
html_output = model.predict("path/to/document_image.png")
print(html_output)
# Batch inference
results = model.predict_batch(["img1.png", "img2.png"])
Inference with Transformers + PEFT (Manual)
import torch
from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel
from PIL import Image
# Load base model
base_model_id = "Qwen/Qwen3.5-4B"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "Yesianrohn")
model = model.merge_and_unload() # Optional: merge for faster inference
# Prepare input
image = Image.open("document.png").convert("RGB")
system_prompt = (
"You are an expert document parser. Given an image of a document page, "
"reconstruct its source as a single complete, self-contained HTML5 "
"document. Faithfully preserve the original layout, typography, tables, "
"formulas, and visual hierarchy using inline CSS where appropriate. "
"Output only the HTML source, with no explanations, no markdown fences, "
"and no extra prose."
)
user_prompt = (
"Convert this document page into a complete HTML document. "
"Preserve the layout, headings, tables, and formulas exactly as shown. "
"Return only the HTML source."
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": user_prompt},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=10240, temperature=0.0, do_sample=False)
output_text = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output_text)
Inference with ms-swift CLI
# Direct inference with swift
swift infer \
--model Qwen/Qwen3.5-4B \
--adapters Yesianrohn \
--merge_lora true \
--torch_dtype bfloat16 \
--stream false
System Prompt
The model was trained with the following system prompt:
You are an expert document parser. Given an image of a document page, reconstruct its source as a single complete, self-contained HTML5 document. Faithfully preserve the original layout, typography, tables, formulas, and visual hierarchy using inline CSS where appropriate. Output only the HTML source, with no explanations, no markdown fences, and no extra prose.
Limitations
- The model works best on clean, well-scanned document pages.
- Very complex multi-column layouts or low-resolution images may produce imperfect HTML.
- The maximum output length is 10240 tokens; very long documents may be truncated.
License
This adapter is released under the Apache 2.0 license. The base model (Qwen3.5-4B) has its own license โ please refer to Qwen/Qwen3.5-4B for details.
- Downloads last month
- 33