lokibots/vit-patch16-1280-gpt2-large-image-summary

This model generates a summary from a given chart image. The model accepts an image of size 1280x768 (or less) and generates a summary describing the contents of the image. However, training is still required.

sample inference code

from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, GPT2Tokenizer
from PIL import Image

model = VisionEncoderDecoderModel.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary")
feature_extractor = ViTFeatureExtractor.from_pretrained("lokibots/vit-patch16-1280-gpt2-large-image-summary")
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-large')

image = Image.open("image_file").convert("RGB")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

gen_kwargs = {"max_length": 1024, "num_beams": 4}
output_ids = model.generate(pixel_values, **gen_kwargs)
preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
Downloads last month
31
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.