File size: 2,834 Bytes
dccace1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bef2ec9
dccace1
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: apache-2.0
language:
- en
metrics:
- accuracy
library_name: adapter-transformers
pipeline_tag: image-to-text
---
# Model Card for Pixelated Captcha Digit Detection

## Model Details

- **License:** Apache-2.0
- **Developed by:** Saidi Souhaieb
- **Finetuned from model:** YOLOv8

## Uses

This model is designed to detect pixelated captcha digits by showing bounding boxes and extracting the coordinates of the detections.

## How to Get Started with the Model

```python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from tqdm import tqdm
from PIL import Image
import torch.nn.functional as F
import os

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 4 * 4, 500)
        self.fc2 = nn.Linear(500, 10)  # 10 classes for example

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 64 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

transform = transforms.Compose([
transforms.Resize((32, 32)),  # Adjust the size accordingly
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

transform = transforms.Compose([
transforms.Resize((32, 32)),  # Adjust the size accordingly
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

model = CNN()
model.load_state_dict(torch.load('models/99acc_model.pth'))

def predict_number(folder_path):
  """
  Predict the numbers in the images in the folder
  """
  predict_numbers = []
  for file in os.listdir(folder_path):
      input_image = Image.open(f"temp/{file}").convert('RGB')
      # Load and preprocess the input image
      input_tensor = transform(input_image)
      input_batch = input_tensor.unsqueeze(0)  # Add a batch dimension

      # Perform inference
      with torch.no_grad():
          output = model(input_batch)

      # Get the predicted class label
      _, predicted = torch.max(output, 1)

      # Print the predicted class label
      print("Predicted class label:", predicted.item(), "file", file)
      predict_numbers.append(predicted.item())

  return predict_numbers

```

## Training Details

### Training Data

Pixel Digit Captcha Data [https://huggingface.co/datasets/Softy-lines/Pixel-Digit-Captcha-Data]

## Model Card Authors 

[Saidi Souhaieb]