Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
26 |
|
27 |
# vqa
|
28 |
|
29 |
-
This model is a fine-tuned version of [unc-nlp/lxmert-base-uncased](https://huggingface.co/unc-nlp/lxmert-base-uncased) on the Graphcore/vqa-lxmert dataset.
|
30 |
It achieves the following results on the evaluation set:
|
31 |
- Loss: 0.0009
|
32 |
- Accuracy: 0.7242
|
@@ -41,10 +41,39 @@ More information needed
|
|
41 |
|
42 |
## Training and evaluation data
|
43 |
|
44 |
-
|
45 |
|
46 |
## Training procedure
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
### Training hyperparameters
|
49 |
|
50 |
The following hyperparameters were used during training:
|
|
|
26 |
|
27 |
# vqa
|
28 |
|
29 |
+
This model is a fine-tuned version of [unc-nlp/lxmert-base-uncased](https://huggingface.co/unc-nlp/lxmert-base-uncased) on the [Graphcore/vqa-lxmert](https://huggingface.co/datasets/Graphcore/vqa-lxmert) dataset.
|
30 |
It achieves the following results on the evaluation set:
|
31 |
- Loss: 0.0009
|
32 |
- Accuracy: 0.7242
|
|
|
41 |
|
42 |
## Training and evaluation data
|
43 |
|
44 |
+
[Graphcore/vqa-lxmert](https://huggingface.co/datasets/Graphcore/vqa-lxmert) dataset
|
45 |
|
46 |
## Training procedure
|
47 |
|
48 |
+
Trained on 16 Graphcore Mk2 IPUs using [optimum-graphcore](https://github.com/huggingface/optimum-graphcore).
|
49 |
+
|
50 |
+
Command line:
|
51 |
+
|
52 |
+
```
|
53 |
+
python examples/language-modeling/run_clm.py \
|
54 |
+
--model_name_or_path gpt2 \
|
55 |
+
--ipu_config_name Graphcore/gpt2-small-ipu \
|
56 |
+
--dataset_name wikitext \
|
57 |
+
--dataset_config_name wikitext-103-raw-v1 \
|
58 |
+
--do_train \
|
59 |
+
--do_eval \
|
60 |
+
--num_train_epochs 10 \
|
61 |
+
--dataloader_num_workers 64 \
|
62 |
+
--per_device_train_batch_size 1 \
|
63 |
+
--per_device_eval_batch_size 1 \
|
64 |
+
--gradient_accumulation_steps 128 \
|
65 |
+
--output_dir /tmp/clm_output \
|
66 |
+
--logging_steps 5 \
|
67 |
+
--learning_rate 1e-5 \
|
68 |
+
--lr_scheduler_type linear \
|
69 |
+
--loss_scaling 16384 \
|
70 |
+
--weight_decay 0.01 \
|
71 |
+
--warmup_ratio 0.1 \
|
72 |
+
--ipu_config_overrides="embedding_serialization_factor=4,optimizer_state_offchip=true,inference_device_iterations=5" \
|
73 |
+
--dataloader_drop_last \
|
74 |
+
--pod_type pod16
|
75 |
+
```
|
76 |
+
|
77 |
### Training hyperparameters
|
78 |
|
79 |
The following hyperparameters were used during training:
|