File size: 9,443 Bytes
c63e85d
 
fd77fa8
 
 
 
51979b1
c63e85d
fd77fa8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
837ddc0
 
 
 
 
 
 
 
 
 
 
 
 
da3feea
926dffa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52aab96
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: gpl-3.0
language:
- en
tags:
- archaeology
- CLIP
---

## ABOUT ARCHAEO-CLIP


This model is the result of "fine-tuning" the openai/clip-vit-base-patch32 model using captioned images of archaeological artifacts published by Open Context. This model is the latest of several iterations in experiments to improve the captions, debug the training pipeline, and try different fine-tuning parameters. It seems a model with a relatively low training rate helps add some "archaeological knowledge" while still retain much of the general knowledge of out-of-the-box CLIP.

We'll use this fine-tuned model in the future to do more experiments, including further fine-tuning with captioned images from open access museum collections of archaeological materials. Below we itemize the specific training parameters used in the fine tuning of this model

The training (45,256 captioned archaeological artifact images from Open Context) and test data, as well as the specific Python code used to run the fine-tuning run can be found here:

https://github.com/opencontext/archaeology-images-ai







Fine Tuning
-----------
Below is the invocation and specific parameters I used for fine tuning:

```bash
python -W ignore finetune-clip-huggingface/huggingface_finetune_clip.py     --output_dir /home/ekansa/github/archaeology-images-ai/results     --model_name_or_path openai/clip-vit-base-patch32     --train_file /home/ekansa/github/archaeology-images-ai/files/train.json     --validation_file /home/ekansa/github/archaeology-images-ai/files/test.json     --image_column="image_path"     --overwrite_output_dir=True     --max_seq_length=77     --num_train_epochs=25     --caption_column="caption"     --overwrite_cache=True     --remove_unused_columns=False     --do_train=True     --per_device_train_batch_size=64     --per_device_eval_batch_size=64     --learning_rate="2e-5" --warmup_steps="2" --weight_decay 0.2

12/10/2023 21:35:43 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
Running tokenizer on train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 45256/45256 [00:02<00:00, 21481.25 examples/s]Parameter 'transform'=<function main.<locals>.transform_images at 0x7fe53504d9e0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
12/10/2023 21:35:47 - WARNING - datasets.fingerprint - Parameter 'transform'=<function main.<locals>.transform_images at 0x7fe53504d9e0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.


{'loss': 1.7174, 'learning_rate': 1.9437224545146346e-05, 'epoch': 0.71}
{'loss': 1.1706, 'learning_rate': 1.887218894790372e-05, 'epoch': 1.41}
{'loss': 0.9596, 'learning_rate': 1.8307153350661094e-05, 'epoch': 2.12}
{'loss': 0.7291, 'learning_rate': 1.7742117753418467e-05, 'epoch': 2.82}
{'loss': 0.5833, 'learning_rate': 1.717708215617584e-05, 'epoch': 3.53}
{'loss': 0.5094, 'learning_rate': 1.6612046558933215e-05, 'epoch': 4.24}
{'loss': 0.4368, 'learning_rate': 1.6047010961690588e-05, 'epoch': 4.94}
{'loss': 0.365, 'learning_rate': 1.548197536444796e-05, 'epoch': 5.65}
{'loss': 0.3394, 'learning_rate': 1.4916939767205336e-05, 'epoch': 6.36}
{'loss': 0.3159, 'learning_rate': 1.4351904169962709e-05, 'epoch': 7.06}
{'loss': 0.2776, 'learning_rate': 1.3786868572720083e-05, 'epoch': 7.77}
{'loss': 0.2584, 'learning_rate': 1.3221832975477456e-05, 'epoch': 8.47}
{'loss': 0.2464, 'learning_rate': 1.2656797378234832e-05, 'epoch': 9.18}
{'loss': 0.227, 'learning_rate': 1.2091761780992204e-05, 'epoch': 9.89}
{'loss': 0.2116, 'learning_rate': 1.1526726183749577e-05, 'epoch': 10.59}
{'loss': 0.2026, 'learning_rate': 1.0961690586506951e-05, 'epoch': 11.3}
{'loss': 0.1869, 'learning_rate': 1.0396654989264325e-05, 'epoch': 12.01}
{'loss': 0.1792, 'learning_rate': 9.831619392021698e-06, 'epoch': 12.71}
{'loss': 0.167, 'learning_rate': 9.266583794779072e-06, 'epoch': 13.42}
{'loss': 0.1671, 'learning_rate': 8.701548197536446e-06, 'epoch': 14.12}
{'loss': 0.154, 'learning_rate': 8.136512600293819e-06, 'epoch': 14.83}
{'loss': 0.1574, 'learning_rate': 7.571477003051193e-06, 'epoch': 15.54}
{'loss': 0.1496, 'learning_rate': 7.006441405808566e-06, 'epoch': 16.24}
{'loss': 0.1329, 'learning_rate': 5.876370211323313e-06, 'epoch': 17.66}
{'loss': 0.1316, 'learning_rate': 5.311334614080687e-06, 'epoch': 18.36}
{'loss': 0.1254, 'learning_rate': 4.746299016838062e-06, 'epoch': 19.07}
{'loss': 0.1266, 'learning_rate': 4.181263419595435e-06, 'epoch': 19.77}
{'loss': 0.1193, 'learning_rate': 3.6162278223528084e-06, 'epoch': 20.48}
{'loss': 0.1163, 'learning_rate': 3.0511922251101822e-06, 'epoch': 21.19}
{'loss': 0.1154, 'learning_rate': 2.486156627867556e-06, 'epoch': 21.89}
{'loss': 0.1125, 'learning_rate': 1.9211210306249294e-06, 'epoch': 22.6}
{'loss': 0.1063, 'learning_rate': 1.356085433382303e-06, 'epoch': 23.31}
{'loss': 0.1082, 'learning_rate': 7.91049836139677e-07, 'epoch': 24.01}
{'loss': 0.1032, 'learning_rate': 2.2601423889705053e-07, 'epoch': 24.72}

{'train_runtime': 78442.5601, 'train_samples_per_second': 14.423, 'train_steps_per_second': 0.226, 'train_loss': 0.31630637788503185, 'epoch': 25.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 17700/17700 [21:47:22<00:00,  4.43s/it]
***** train metrics *****
  epoch                    =        25.0
  train_loss               =      0.3163
  train_runtime            = 21:47:22.56
  train_samples_per_second =      14.423
  train_steps_per_second   =       0.226
```




---
license: gpl-3.0
---



Credit / Acknowledgements / Blame
---------------------------------

Shawn Graham (https://huggingface.co/sgraham https://carleton.ca/history/people/shawn-graham/) figured out how to use 
available open source tooling to fine-tune the CLIP model.

Eric Kansa (https://github.com/ekansa) put together the captioned training data and further streamlined and debugged the 
open source tooling to fine-tune the CLIP model.

Many, many data contributors to Open Context did the hard work of excavating, surveying, and documenting the archaeological 
objects used in this training. Images and data used for captioning came from the following projects:

* Archaeology of the International Space Station (https://opencontext.org/projects/e682f907-6e4a-44cc-8a5f-3e2c73001673)
* Bade Museum (https://opencontext.org/projects/b4345f6a-f926-4062-144e-3fbc175cc7b6)
* Balance Pan Weights from Nippur (https://opencontext.org/projects/8f947319-3c69-4847-b7a2-09e00ed90b32)
* China Ceramic Petrography Database (https://opencontext.org/projects/2c5addea-41d5-4941-b2bd-672bc1e60448)
* Differentiating local from nonlocal ceramic production at Late Bronze Age/Iron Age Kinet HΓΆyΓΌk using NAA (https://opencontext.org/projects/81d1157d-28f4-46ff-98dd-94899c1688f8)
* Domuztepe Excavations (https://opencontext.org/projects/e6fb0f7c-6f69-6ca8-683b-8c6d5e98a099)
* Historic Fort Snelling (https://opencontext.org/projects/fab0532a-2953-4f13-aa97-8a9d7e992dbe)
* Kenan Tepe (https://opencontext.org/projects/3de4cd9c-259e-4c14-9b03-8b10454ba66e)
* Khirbat al-Mudayna al-Aliya (https://opencontext.org/projects/cbb6b9f7-500c-4ddd-71aa-4d5e5b96cdbb)
* Madaba Plains Project-`Umayri (https://opencontext.org/projects/2015946b-da6d-4a0f-b676-03bee1efad21)
* Mikt’sqaq Angayuk Finds (https://opencontext.org/projects/cf6e1364-d6ef-4042-b726-82cfb73f7c9d)
* Murlo (https://opencontext.org/projects/df043419-f23b-41da-7e4d-ee52af22f92f)
* Petra Great Temple Excavations (https://opencontext.org/projects/a5ddbea2-b3c8-43f9-8151-33343cbdc857)
* Pyla-Koutsopetria Archaeological Project I: Pedestrian Survey (https://opencontext.org/projects/3f6dcd13-a476-488e-ed10-47d25513fcb2)
* Pyla-Koutsopetria Archaeological Project II: Geophysics and Excavation (https://opencontext.org/projects/b9472eec-e622-4838-b6d8-5a2958b9d4d3)
* Tel Kedesh Sealing Images (https://opencontext.org/projects/a2d7b2d2-b5de-4433-8d69-2eaf9456349e)
* The Eastern Korinthia Archaeological Survey (https://opencontext.org/projects/bc71c724-eb1e-47d6-9d45-b586ddafdcfe)
* The Gabii Project (https://opencontext.org/projects/3585b372-8d2d-436c-9a4c-b5c10fce3ccd)
* Virtual Valdivia (https://opencontext.org/projects/d3bed915-947d-4c1a-8c9a-b7723f03d21a)

Finally, we're grateful to the many people (ancient and some modern) who made and used the physical objects that informed this model. Everything we do now is built on their legacy.