Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for ResNet-50 Text Detector

This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~70k images, where 50% of them had text and 50% of them had no legible text.

Model Details

How to Get Started with the Model

from PIL import Image
import requests

from transformers import AutoImageProcessor, AutoModelForImageClassification

model = AutoModelForImageClassification.from_pretrained(
    "miguelcarv/resnet-50-text-detector",
)

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)

url = "http://images.cocodataset.org/train2017/000000044520.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((256,256))

inputs = processor(image, return_tensors="pt").pixel_values

outputs = model(inputs)
logits_per_image = outputs.logits 
probs = logits_per_image.softmax(dim=1) 
print(probs)
# tensor([[0.1149, 0.8851]])

Training Details

  • Trained for three epochs
  • Resolution: 256x256
  • Learning rate: 5e-5
  • Optimizer: AdamW
  • Batch size: 64
  • Trained with FP32
Downloads last month
21
Safetensors
Model size
23.6M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.