Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Model Card for Model ID

This is a first VLLM model able to switch PEFT adapters between two transcription styles for Wertern ancient manuscripts:

  • ABBreviated style: Keeping the original abbreviations from the manuscripts using MUFI characters.

  • NOT_ABBreviated style : Developping the abbreviations and symbols used in the manuscript to produce a normalized text.

Model Description

  • Developed by: [Sergio Torres Aguilar]
  • Model type: [Multimodal]
  • Language(s) (NLP): [Latin, French, Spanish, German]
  • License: [MIT]

Uses

The model use two light PEFT adapter added to the MiniCPM-Llama3-V-2_5 (2024)

How to Get Started with the Model

The following code is intended to produce both transcription styles based on a folder containing graphical manuscripts lines:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from PIL import Image
import os
from tqdm import tqdm
import json

# Configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_name = "openbmb/MiniCPM-Llama3-V-2_5"
abbr_adapters = "magistermilitum/HTR_ABBR_minicpm"
not_abbr_adapters = "magistermilitum/HTR_NOT_ABBR_minicpm"

image_folder = "/your/images/folder/path"

class TranscriptionModel:
    """Handles model loading, adapter switching, and transcription generation."""
    def __init__(self, model_name, abbr_adapters, not_abbr_adapters, device):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
        self.base_model = AutoModelForCausalLM.from_pretrained(
            model_name, trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16, token=True
        )
        self.base_model = PeftModel.from_pretrained(self.base_model, abbr_adapters, adapter_name="ABBR")
        self.base_model.load_adapter(not_abbr_adapters, adapter_name="NOT_ABBR")
        self.base_model.set_adapter("ABBR")  # Set default adapter
        self.base_model.to(device).eval()

    def generate(self, adapter, image):
        """Generate transcription for the given adapter and image."""
        if hasattr(self.base_model, "past_key_values"):
            self.base_model.past_key_values = None
        self.base_model.set_adapter(adapter)
        msgs = [{"role": "user", "content": [f"Transcribe this manuscript line in mode <{adapter}>:", image]}]
        with torch.no_grad():
            res = self.base_model.chat(image=image, msgs=msgs, tokenizer=self.tokenizer, max_new_tokens=128)
        # Remove <ABBR> and <NOT_ABBR> tokens from the output
        res = res.replace(f"<{adapter}>", "").replace(f"</{adapter}>", "")
        return res


class TranscriptionPipeline:
    """Handles image processing, transcription, and result saving."""
    def __init__(self, model, image_folder):
        self.model = model
        self.image_folder = image_folder

    def run_inference(self):
        """Process all images in the folder and generate transcriptions."""
        results = []
        for image_file in tqdm([f for f in os.listdir(self.image_folder)[:20] if f.endswith(('.png', '.jpg', '.jpeg'))]):
            image = Image.open(os.path.join(self.image_folder, image_file)).convert("RGB")
            print(f"\nProcessing image: {image_file}")
            
            # Generate transcriptions for both adapters
            transcriptions = {
                adapter: self.model.generate(adapter, image)
                for adapter in ["ABBR", "NOT_ABBR"]
            }
            for adapter, res in transcriptions.items():
                print(f"Mode ({adapter}): {res}")
            results.append({"image": image_file, "transcriptions": transcriptions})

            #image.show() #Optional

        # Save results to a JSON file
        with open("transcriptions_results.json", "w", encoding="utf-8") as f:
            json.dump(results, f, ensure_ascii=False, indent=4)


# Initialize and run the pipeline
model = TranscriptionModel(model_name, abbr_adapters, not_abbr_adapters, device)
TranscriptionPipeline(model, image_folder).run_inference()

Citation

@misc{torres_aguilar:hal-04983305,
    title={Dual-Style Transcription of Historical Manuscripts based on Multimodal Small Language Models with Switchable Adapters}, 
    author={Torres Aguilar, Sergio},
    url={https://hal.science/hal-04983305},
    year={2025},
    note = {working paper or preprint}
}
  • PEFT 0.14.1.dev0
Downloads last month
13
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-text-to-text models for peft library.

Model tree for magistermilitum/HTR_NOT_ABBR_minicpm

Adapter
(7)
this model

Datasets used to train magistermilitum/HTR_NOT_ABBR_minicpm