kmewhort's picture
Update README.md
dff82eb
metadata
license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: beit-sketch-classifier
    results: []

beit-sketch-classifier

This model is a version of microsoft/beit-base-patch16-224-pt22k-ft22k fine-tuned on a dataset of Quick!Draw! sketches (1 percent of the 50M sketches). It achieves the following results on the evaluation set:

  • Loss: 1.6083
  • Accuracy: 0.7480

Intended uses & limitations

It's intended to be used to classifier sketches with a line-segment input format (there's no data augmentation in the fine-tuning; the input raster images ideally need to be generated from line-vector format very similarly to the training images).

You can generate the requisite PIL images from Quickdraw bin format with the following:

# packed bytes -> dict (fro mhttps://github.com/googlecreativelab/quickdraw-dataset/blob/master/examples/binary_file_parser.py)
def unpack_drawing(file_handle):
    key_id, = unpack('Q', file_handle.read(8))
    country_code, = unpack('2s', file_handle.read(2))
    recognized, = unpack('b', file_handle.read(1))
    timestamp, = unpack('I', file_handle.read(4))
    n_strokes, = unpack('H', file_handle.read(2))
    image = []
    n_bytes = 17
    for i in range(n_strokes):
        n_points, = unpack('H', file_handle.read(2))
        fmt = str(n_points) + 'B'
        x = unpack(fmt, file_handle.read(n_points))
        y = unpack(fmt, file_handle.read(n_points))
        image.append((x, y))
        n_bytes += 2 + 2*n_points
    result = {
        'key_id': key_id,
        'country_code': country_code,
        'recognized': recognized,
        'timestamp': timestamp,
        'image': image,
    }
    return result

# packed bin -> RGB PIL
def binToPIL(packed_drawing):
    padding = 8
    radius = 7
    scale = (224.0-(2*padding)) / 256
    
    unpacked = unpack_drawing(io.BytesIO(packed_drawing))
    unpacked_image = unpacked['image']
    image = np.full((224,224), 255, np.uint8)
    for stroke in unpacked['image']:
        prevX = round(stroke[0][0]*scale)
        prevY = round(stroke[1][0]*scale)
        for i in range(1, len(stroke[0])):
            x = round(stroke[0][i]*scale)
            y = round(stroke[1][i]*scale)
            cv2.line(image, (padding+prevX, padding+prevY), (padding+x, padding+y), 0, radius, -1)
            prevX = x
            prevY = y
    pilImage = Image.fromarray(image).convert("RGB")     
    return pilImage

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.3452 1.0 3151 1.3825 0.6702
1.052 2.0 6302 1.0776 0.7252
0.9884 3.0 9453 0.9989 0.7443
0.8054 4.0 12604 0.9747 0.7526
0.6271 5.0 15755 0.9770 0.7558
0.5719 6.0 18906 1.0201 0.7528
0.3557 7.0 22057 1.0702 0.7523
0.2637 8.0 25208 1.1324 0.7501
0.1878 9.0 28359 1.2129 0.7434
0.1616 10.0 31510 1.2692 0.7457
0.1148 11.0 34661 1.3425 0.7435
0.0867 12.0 37812 1.3999 0.7430
0.065 13.0 40963 1.4472 0.7442
0.0489 14.0 44114 1.4836 0.7457
0.0365 15.0 47265 1.5194 0.7445
0.0386 16.0 50416 1.5506 0.7458
0.0315 17.0 53567 1.5778 0.7461
0.0236 18.0 56718 1.5986 0.7467
0.0264 19.0 59869 1.6085 0.7475
0.0146 20.0 63020 1.6083 0.7480

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cu117
  • Datasets 2.7.1
  • Tokenizers 0.13.2