Florence-2-large-FormClassification-ft

This model is a fine-tuned version of microsoft/Florence-2-large-ft on an Musa07/Florence_ft dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2107

Inference Code

  # Code
    from transformers import AutoProcessor, AutoModelForCausalLM
    import matplotlib.pyplot as plt
    import matplotlib.patches as patches
  
    model = AutoModelForCausalLM.from_pretrained("Musa07/Florence-2-large-FormClassification-ft", trust_remote_code=True, device_map='cuda') # Load the model on GPU if available
    processor = AutoProcessor.from_pretrained("Musa07/Florence-2-large-FormClassification-ft", trust_remote_code=True)
  
    def run_example(task_prompt, image, max_new_tokens=128):
      prompt = task_prompt
      inputs = processor(text=prompt, images=image, return_tensors="pt")
      generated_ids = model.generate(
        input_ids=inputs["input_ids"].cuda(),
        pixel_values=inputs["pixel_values"].cuda(),
        max_new_tokens=max_new_tokens,
        early_stopping=False,
        do_sample=False,
        num_beams=3,
      )
      generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
      parsed_answer = processor.post_process_generation(
          generated_text,
          task=task_prompt,
          image_size=(image.width, image.height)
      )
      return parsed_answer
      
    def plot_bbox(image, data):
      fig, ax = plt.subplots()
  
      # Display the image
      ax.imshow(image)
  
      # Plot each bounding box
      for bbox, label in zip(data['bboxes'], data['labels']):
          # Unpack the bounding box coordinates
          x1, y1, x2, y2 = bbox
          # Create a Rectangle patch
          rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=1, edgecolor='r', facecolor='none')
          # Add the rectangle to the Axes
          ax.add_patch(rect)
          # Annotate the label
          plt.text(x1, y1, label, color='white', fontsize=8, bbox=dict(facecolor='red', alpha=0.5))
  
      # Remove the axis ticks and labels
      ax.axis('off')
  
      # Show the plot
      plt.show()
      
    image = Image.open('1.jpeg')
    parsed_answer = run_example("<OD>", image=image)
    print(parsed_answer)
    plot_bbox(image, parsed_answer["<OD>"])

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss
0.0188 1.0 23 0.2151
0.0127 2.0 46 0.2113
0.0078 3.0 69 0.2061
0.0047 4.0 92 0.2102
0.0042 5.0 115 0.2078
0.003 6.0 138 0.2108
0.0022 7.0 161 0.2110
0.0029 8.0 184 0.2117
0.0019 9.0 207 0.2114
0.0023 10.0 230 0.2107

Framework versions

  • Transformers 4.44.0.dev0
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
112
Safetensors
Model size
823M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for Musa07/Florence-2-large-FormClassification-ft

Finetuned
(8)
this model