NoyHanan's picture
Update README.md
8c2c4be verified
metadata
license: other
library_name: peft
tags:
  - generated_from_trainer
base_model: Salesforce/instructblip-vicuna-7b
datasets:
  - pantheon-prompts-dataset
model-index:
  - name: instructblip-vicuna-7b-peft-lora
    results: []

instructblip-vicuna-7b-peft-lora

This model is a fine-tuned version of Salesforce/instructblip-vicuna-7b on the pantheon-prompts-dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3583

Model Description

Project Overview

This model is part of a two-phase project aimed at automatic prompt engineering for text-to-image generation.

Current Phase: Supervised Fine-Tuning

  • Status: Completed
  • Input: Base prompt and an image
  • Output: Enhanced prompt for image generation
  • Purpose: Adapt the base model to generate improved prompts

Future Phase: Reinforcement Learning Fine-Tuning

  • Status: Planned
  • Method: Proximal Policy Optimization (PPO)
  • Purpose: Further refine prompt quality

Ultimate Objective

  1. Accept a base prompt and a preferred generated image as input
  2. Automatically engineer an enhanced prompt
  3. Use the enhanced prompt to generate higher-quality images with the same text-to-image model

Checkpoint Information

This model checkpoint represents the completion of the Supervised Fine-Tuning phase (Phase 1) in the overall project.

Training Limitations

  • Dataset Size: The model was trained on a limited dataset of 1,600 examples.
  • Resource Constraints: Due to computational resource limitations, we were unable to use a larger training set.
  • Potential Issues:
    • The model may not have fully generalized to a wide range of inputs.
    • There is a risk of overfitting to the training data.
  • Caution: Users should be aware that the model's performance might be inconsistent on inputs that significantly differ from the training set.

Training and evaluation data

pantheon-prompts-dataset

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 1000

How to use:

from transformers import (
    BitsAndBytesConfig,
    InstructBlipProcessor,
    InstructBlipForConditionalGeneration,
)

# Define the quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-7b", legacy=False)
processor.padding_side = "right"
processor.tokenizer.padding_side = "right"

model = InstructBlipForConditionalGeneration.from_pretrained(
        "Salesforce/instructblip-vicuna-7b", quantization_config=bnb_config, device_map="auto"
)

model = PeftModelForCausalLM.from_pretrained(
    model,
    "NoyHanan/instructblip-vicuna-7b-peft-lora",
    is_trainable=False,
    adapter_name="lora_policy",
)

prompt = "<Base_Prompt>"
image = "<Image>"

inputs = self.base_processor(texts=prompt, images=[image]).to("cuda")

res = model.generate(
    **inputs,
    do_sample=True,
    pad_token_id=processor.tokenizer.pad_token_id,
    top_p=1.0,
    top_k=0,
    temperature=0.5,
)

enhanced_prompt = processor.decode(res, skip_special_tokens=True)

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1