--- license: mit base_model: microsoft/git-base tags: - generated_from_trainer model-index: - name: git-base-on-diffuision-dataset2 results: [] language: - en library_name: transformers pipeline_tag: image-to-text --- # git-base-on-diffuision-dataset2 This model is a fine-tuned version of [microsoft/git-base](https://huggingface.co/microsoft/git-base) on hieudinhpro/diffuision-dataset2 dataset. ## Model description GIT (short for GenerativeImage2Text) model, base-sized version. \ It was introduced in the paper GIT: A Generative Image-to-text Transformer for Vision and Language \ \ Model train for task : Sketch Scene image to text ## How to use mdoel ``` # Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("microsoft/git-base") model = AutoModelForCausalLM.from_pretrained("hieudinhpro/git-base-on-diffuision-dataset2") ``` ``` # load image from PIL import Image image = Image.open('/content/image_3.jpg') ``` ``` # pre image inputs = processor(images=image, return_tensors="pt") pixel_values = inputs.pixel_values # predict generated_ids = model.generate(pixel_values=pixel_values, max_length=50) # decode to text generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(generated_caption) ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 2 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 ### Framework versions - Transformers 4.34.0 - Pytorch 2.0.1+cu118 - Datasets 2.14.5 - Tokenizers 0.14.0