metadata

license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: beit-sketch-classifier
    results: []

beit-sketch-classifier

This model is a version of microsoft/beit-base-patch16-224-pt22k-ft22k fine-tuned on a dataset of Quick!Draw! sketches (1 percent of the 50M sketches). It achieves the following results on the evaluation set:

Loss: 1.6083
Accuracy: 0.7480

Intended uses & limitations

It's intended to be used to classifier sketches with a line-segment input format (there's no data augmentation in the fine-tuning; the input raster images ideally need to be generated from line-vector format very similarly to the training images).

You can generate the requisite PIL images from Quickdraw bin format with the following:

# packed bytes -> dict (fro mhttps://github.com/googlecreativelab/quickdraw-dataset/blob/master/examples/binary_file_parser.py)
def unpack_drawing(file_handle):
    key_id, = unpack('Q', file_handle.read(8))
    country_code, = unpack('2s', file_handle.read(2))
    recognized, = unpack('b', file_handle.read(1))
    timestamp, = unpack('I', file_handle.read(4))
    n_strokes, = unpack('H', file_handle.read(2))
    image = []
    n_bytes = 17
    for i in range(n_strokes):
        n_points, = unpack('H', file_handle.read(2))
        fmt = str(n_points) + 'B'
        x = unpack(fmt, file_handle.read(n_points))
        y = unpack(fmt, file_handle.read(n_points))
        image.append((x, y))
        n_bytes += 2 + 2*n_points
    result = {
        'key_id': key_id,
        'country_code': country_code,
        'recognized': recognized,
        'timestamp': timestamp,
        'image': image,
    }
    return result

# packed bin -> RGB PIL
def binToPIL(packed_drawing):
    padding = 8
    radius = 7
    scale = (224.0-(2*padding)) / 256
    
    unpacked = unpack_drawing(io.BytesIO(packed_drawing))
    unpacked_image = unpacked['image']
    image = np.full((224,224), 255, np.uint8)
    for stroke in unpacked['image']:
        prevX = round(stroke[0][0]*scale)
        prevY = round(stroke[1][0]*scale)
        for i in range(1, len(stroke[0])):
            x = round(stroke[0][i]*scale)
            y = round(stroke[1][i]*scale)
            cv2.line(image, (padding+prevX, padding+prevY), (padding+x, padding+y), 0, radius, -1)
            prevX = x
            prevY = y
    pilImage = Image.fromarray(image).convert("RGB")     
    return pilImage

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.3452	1.0	3151	1.3825	0.6702
1.052	2.0	6302	1.0776	0.7252
0.9884	3.0	9453	0.9989	0.7443
0.8054	4.0	12604	0.9747	0.7526
0.6271	5.0	15755	0.9770	0.7558
0.5719	6.0	18906	1.0201	0.7528
0.3557	7.0	22057	1.0702	0.7523
0.2637	8.0	25208	1.1324	0.7501
0.1878	9.0	28359	1.2129	0.7434
0.1616	10.0	31510	1.2692	0.7457
0.1148	11.0	34661	1.3425	0.7435
0.0867	12.0	37812	1.3999	0.7430
0.065	13.0	40963	1.4472	0.7442
0.0489	14.0	44114	1.4836	0.7457
0.0365	15.0	47265	1.5194	0.7445
0.0386	16.0	50416	1.5506	0.7458
0.0315	17.0	53567	1.5778	0.7461
0.0236	18.0	56718	1.5986	0.7467
0.0264	19.0	59869	1.6085	0.7475
0.0146	20.0	63020	1.6083	0.7480

Framework versions

Transformers 4.25.1
Pytorch 1.13.1+cu117
Datasets 2.7.1
Tokenizers 0.13.2