--- license: apache-2.0 language: - zh pipeline_tag: image-to-text widget: - src: >- https://huggingface.co/snzhang/FilmTitle-Beit-GPT2/resolve/main/SpiderMan.jpg example_title: SpiderMan - src: >- https://huggingface.co/snzhang/FilmTitle-Beit-GPT2/resolve/main/BorntoFly.jpg example_title: Born to Fly --- # Image Caption Model ## Model description The model is used to generate the Chinese title of a random movie post. It is based on the [BEiT](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k) and [GPT2](https://huggingface.co/IDEA-CCNL/Wenzhong-GPT2-110M). ## Training Data The training data contains 5043 movie posts and their corresponding Chinese title which are collected by [Movie-Title-Post](https://huggingface.co/datasets/snzhang/Movie-Title-Post) ## How to use ```Python from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer from PIL import Image pretrained = "snzhang/FilmTitle-Beit-GPT2" model = VisionEncoderDecoderModel.from_pretrained(pretrained) feature_extractor = ViTFeatureExtractor.from_pretrained(pretrained) tokenizer = AutoTokenizer.from_pretrained(pretrained) image_path = "your image path" image = Image.open(image_path) if image.mode != "RGB": image = image.convert("RGB") pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values output_ids = model.generate(pixel_values, **gen_kwargs) preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True) preds = [pred.strip() for pred in preds] print(preds) ``` ## More Details You can get more training details in [FilmTitle-Beit-GPT2](https://github.com/h7nian/FilmTitle-Beit-GPT2)