Update README.md

dff82eb over 1 year ago

No virus

4.69 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: beit-sketch-classifier
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# beit-sketch-classifier

	This model is a version of [microsoft/beit-base-patch16-224-pt22k-ft22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k) fine-tuned on a dataset of Quick!Draw! sketches ([1 percent of the 50M sketches](https://huggingface.co/datasets/kmewhort/quickdraw-bins-1pct-sample)).
	It achieves the following results on the evaluation set:
	- Loss: 1.6083
	- Accuracy: 0.7480

	## Intended uses & limitations

	It's intended to be used to classifier sketches with a line-segment input format (there's no data augmentation in the fine-tuning; the input raster images ideally need to be generated from line-vector format very similarly to the training images).

	You can generate the requisite PIL images from Quickdraw `bin` format with the following:

	```
	# packed bytes -> dict (fro mhttps://github.com/googlecreativelab/quickdraw-dataset/blob/master/examples/binary_file_parser.py)
	def unpack_drawing(file_handle):
	key_id, = unpack('Q', file_handle.read(8))
	country_code, = unpack('2s', file_handle.read(2))
	recognized, = unpack('b', file_handle.read(1))
	timestamp, = unpack('I', file_handle.read(4))
	n_strokes, = unpack('H', file_handle.read(2))
	image = []
	n_bytes = 17
	for i in range(n_strokes):
	n_points, = unpack('H', file_handle.read(2))
	fmt = str(n_points) + 'B'
	x = unpack(fmt, file_handle.read(n_points))
	y = unpack(fmt, file_handle.read(n_points))
	image.append((x, y))
	n_bytes += 2 + 2*n_points
	result = {
	'key_id': key_id,
	'country_code': country_code,
	'recognized': recognized,
	'timestamp': timestamp,
	'image': image,
	}
	return result

	# packed bin -> RGB PIL
	def binToPIL(packed_drawing):
	padding = 8
	radius = 7
	scale = (224.0-(2*padding)) / 256

	unpacked = unpack_drawing(io.BytesIO(packed_drawing))
	unpacked_image = unpacked['image']
	image = np.full((224,224), 255, np.uint8)
	for stroke in unpacked['image']:
	prevX = round(stroke[0][0]*scale)
	prevY = round(stroke[1][0]*scale)
	for i in range(1, len(stroke[0])):
	x = round(stroke[0][i]*scale)
	y = round(stroke[1][i]*scale)
	cv2.line(image, (padding+prevX, padding+prevY), (padding+x, padding+y), 0, radius, -1)
	prevX = x
	prevY = y
	pilImage = Image.fromarray(image).convert("RGB")
	return pilImage
	```


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------:\|
	\| 1.3452 \| 1.0 \| 3151 \| 1.3825 \| 0.6702 \|
	\| 1.052 \| 2.0 \| 6302 \| 1.0776 \| 0.7252 \|
	\| 0.9884 \| 3.0 \| 9453 \| 0.9989 \| 0.7443 \|
	\| 0.8054 \| 4.0 \| 12604 \| 0.9747 \| 0.7526 \|
	\| 0.6271 \| 5.0 \| 15755 \| 0.9770 \| 0.7558 \|
	\| 0.5719 \| 6.0 \| 18906 \| 1.0201 \| 0.7528 \|
	\| 0.3557 \| 7.0 \| 22057 \| 1.0702 \| 0.7523 \|
	\| 0.2637 \| 8.0 \| 25208 \| 1.1324 \| 0.7501 \|
	\| 0.1878 \| 9.0 \| 28359 \| 1.2129 \| 0.7434 \|
	\| 0.1616 \| 10.0 \| 31510 \| 1.2692 \| 0.7457 \|
	\| 0.1148 \| 11.0 \| 34661 \| 1.3425 \| 0.7435 \|
	\| 0.0867 \| 12.0 \| 37812 \| 1.3999 \| 0.7430 \|
	\| 0.065 \| 13.0 \| 40963 \| 1.4472 \| 0.7442 \|
	\| 0.0489 \| 14.0 \| 44114 \| 1.4836 \| 0.7457 \|
	\| 0.0365 \| 15.0 \| 47265 \| 1.5194 \| 0.7445 \|
	\| 0.0386 \| 16.0 \| 50416 \| 1.5506 \| 0.7458 \|
	\| 0.0315 \| 17.0 \| 53567 \| 1.5778 \| 0.7461 \|
	\| 0.0236 \| 18.0 \| 56718 \| 1.5986 \| 0.7467 \|
	\| 0.0264 \| 19.0 \| 59869 \| 1.6085 \| 0.7475 \|
	\| 0.0146 \| 20.0 \| 63020 \| 1.6083 \| 0.7480 \|


	### Framework versions

	- Transformers 4.25.1
	- Pytorch 1.13.1+cu117
	- Datasets 2.7.1
	- Tokenizers 0.13.2