YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model Card for ResNet-152 Text Detector
This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~140k images, where 50% of them had text and 50% of them had no legible text.
Model Details
How to Get Started with the Model
from PIL import Image
import requests
import torch
from transformers import AutoImageProcessor, AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained(
"miguelcarv/resnet-152-text-detector",
)
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)
url = "http://images.cocodataset.org/train2017/000000044520.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((300,300))
inputs = processor(image, return_tensors="pt").pixel_values
with torch.no_grad():
outputs = model(inputs)
logits_per_image = outputs.logits
probs = logits_per_image.softmax(dim=1)
print(probs)
# tensor([[0.1085, 0.8915]])
Training Details
- Trained for three epochs
- Resolution: 300x300
- Learning rate: 5e-5
- Optimizer: AdamW
- Batch size: 64
- Trained with FP32
- Downloads last month
- 2,077
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.