File size: 4,689 Bytes
7a3df5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dff82eb
7a3df5e
 
 
 
dff82eb
7a3df5e
dff82eb
7a3df5e
dff82eb
7a3df5e
dff82eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a3df5e
dff82eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a3df5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
license: apache-2.0
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: beit-sketch-classifier
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# beit-sketch-classifier

This model is a version of [microsoft/beit-base-patch16-224-pt22k-ft22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k) fine-tuned on a dataset of Quick!Draw! sketches ([1 percent of the 50M sketches](https://huggingface.co/datasets/kmewhort/quickdraw-bins-1pct-sample)).
It achieves the following results on the evaluation set:
- Loss: 1.6083
- Accuracy: 0.7480

## Intended uses & limitations

It's intended to be used to classifier sketches with a line-segment input format (there's no data augmentation in the fine-tuning; the input raster images ideally need to be generated from line-vector format very similarly to the training images).

You can generate the requisite PIL images from Quickdraw `bin` format with the following:

```
# packed bytes -> dict (fro mhttps://github.com/googlecreativelab/quickdraw-dataset/blob/master/examples/binary_file_parser.py)
def unpack_drawing(file_handle):
    key_id, = unpack('Q', file_handle.read(8))
    country_code, = unpack('2s', file_handle.read(2))
    recognized, = unpack('b', file_handle.read(1))
    timestamp, = unpack('I', file_handle.read(4))
    n_strokes, = unpack('H', file_handle.read(2))
    image = []
    n_bytes = 17
    for i in range(n_strokes):
        n_points, = unpack('H', file_handle.read(2))
        fmt = str(n_points) + 'B'
        x = unpack(fmt, file_handle.read(n_points))
        y = unpack(fmt, file_handle.read(n_points))
        image.append((x, y))
        n_bytes += 2 + 2*n_points
    result = {
        'key_id': key_id,
        'country_code': country_code,
        'recognized': recognized,
        'timestamp': timestamp,
        'image': image,
    }
    return result

# packed bin -> RGB PIL
def binToPIL(packed_drawing):
    padding = 8
    radius = 7
    scale = (224.0-(2*padding)) / 256
    
    unpacked = unpack_drawing(io.BytesIO(packed_drawing))
    unpacked_image = unpacked['image']
    image = np.full((224,224), 255, np.uint8)
    for stroke in unpacked['image']:
        prevX = round(stroke[0][0]*scale)
        prevY = round(stroke[1][0]*scale)
        for i in range(1, len(stroke[0])):
            x = round(stroke[0][i]*scale)
            y = round(stroke[1][i]*scale)
            cv2.line(image, (padding+prevX, padding+prevY), (padding+x, padding+y), 0, radius, -1)
            prevX = x
            prevY = y
    pilImage = Image.fromarray(image).convert("RGB")     
    return pilImage
```


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 20

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Accuracy |
|:-------------:|:-----:|:-----:|:---------------:|:--------:|
| 1.3452        | 1.0   | 3151  | 1.3825          | 0.6702   |
| 1.052         | 2.0   | 6302  | 1.0776          | 0.7252   |
| 0.9884        | 3.0   | 9453  | 0.9989          | 0.7443   |
| 0.8054        | 4.0   | 12604 | 0.9747          | 0.7526   |
| 0.6271        | 5.0   | 15755 | 0.9770          | 0.7558   |
| 0.5719        | 6.0   | 18906 | 1.0201          | 0.7528   |
| 0.3557        | 7.0   | 22057 | 1.0702          | 0.7523   |
| 0.2637        | 8.0   | 25208 | 1.1324          | 0.7501   |
| 0.1878        | 9.0   | 28359 | 1.2129          | 0.7434   |
| 0.1616        | 10.0  | 31510 | 1.2692          | 0.7457   |
| 0.1148        | 11.0  | 34661 | 1.3425          | 0.7435   |
| 0.0867        | 12.0  | 37812 | 1.3999          | 0.7430   |
| 0.065         | 13.0  | 40963 | 1.4472          | 0.7442   |
| 0.0489        | 14.0  | 44114 | 1.4836          | 0.7457   |
| 0.0365        | 15.0  | 47265 | 1.5194          | 0.7445   |
| 0.0386        | 16.0  | 50416 | 1.5506          | 0.7458   |
| 0.0315        | 17.0  | 53567 | 1.5778          | 0.7461   |
| 0.0236        | 18.0  | 56718 | 1.5986          | 0.7467   |
| 0.0264        | 19.0  | 59869 | 1.6085          | 0.7475   |
| 0.0146        | 20.0  | 63020 | 1.6083          | 0.7480   |


### Framework versions

- Transformers 4.25.1
- Pytorch 1.13.1+cu117
- Datasets 2.7.1
- Tokenizers 0.13.2