albert-act-base / README.md

Update README.md

29b9d64 about 2 years ago

5.38 kB

	---
	license: apache-2.0
	language: en
	datasets:
	- wikipedia
	- bookcorpus
	model-index:
	- name: asi/albert-act-base
	results:
	- task:
	type: text-classification
	name: CoLA
	dataset:
	type: glue
	name: CoLA # General Language Understanding Evaluation benchmark (GLUE)
	split: cola
	metrics:
	- type: matthews_correlation
	value: 36.7
	name: Matthew's Corr
	- task:
	type: text-classification
	name: SST-2
	dataset:
	type: glue
	name: SST-2 # The Stanford Sentiment Treebank
	split: sst2
	metrics:
	- type: accuracy
	value: 87.8
	name: Accuracy
	- task:
	type: text-classification
	name: MRPC
	dataset:
	type: glue
	name: MRPC # Microsoft Research Paraphrase Corpus
	split: mrpc
	metrics:
	- type: accuracy
	value: 81.4
	name: Accuracy
	- type: f1
	value: 86.5
	name: F1
	- task:
	type: text-similarity
	name: STS-B
	dataset:
	type: glue
	name: STS-B # Semantic Textual Similarity Benchmark
	split: stsb
	metrics:
	- type: spearmanr
	value: 83.0
	name: Spearman Corr
	- type: pearsonr
	value: 84.2
	name: Pearson Corr
	- task:
	type: text-classification
	name: QQP
	dataset:
	type: glue
	name: QQP # Quora Question Pairs
	split: qqp
	metrics:
	- type: f1
	value: 68.5
	name: F1
	- type: accuracy
	value: 87.7
	name: Accuracy
	- task:
	type: text-classification
	name: MNLI-m
	dataset:
	type: glue
	name: MNLI-m # MultiNLI Matched
	split: mnli_matched
	metrics:
	- type: accuracy
	value: 79.9
	name: Accuracy
	- task:
	type: text-classification
	name: MNLI-mm
	dataset:
	type: glue
	name: MNLI-mm # MultiNLI Matched
	split: mnli_mismatched
	metrics:
	- type: accuracy
	value: 79.2
	name: Accuracy
	- task:
	type: text-classification
	name: QNLI
	dataset:
	type: glue
	name: QNLI # Question NLI
	split: qnli
	metrics:
	- type: accuracy
	value: 89.0
	name: Accuracy
	- task:
	type: text-classification
	name: RTE
	dataset:
	type: glue
	name: RTE # Recognizing Textual Entailment
	split: rte
	metrics:
	- type: accuracy
	value: 63.0
	name: Accuracy
	- task:
	type: text-classification
	name: WNLI
	dataset:
	type: glue
	name: WNLI # Winograd NLI
	split: wnli
	metrics:
	- type: accuracy
	value: 65.1
	name: Accuracy
	---



	# Adaptive Depth Transformers

	Implementation of the paper "How Many Layers and Why? An Analysis of the Model Depth in Transformers". In this study, we investigate the role of the multiple layers in deep transformer models. We design a variant of ALBERT that dynamically adapts the number of layers for each token of the input.

	## Model architecture

	We augment a multi-layer transformer encoder with a halting mechanism, which dynamically adjusts the number of layers for each token.
	We directly adapted this mechanism from Graves ([2016](#graves-2016)). At each iteration, we compute a probability for each token to stop updating its state.

	## Model use

	The architecture is not yet directly included in the Transformers library. The code used for pre-training is available in the following [github repository](https://github.com/AntoineSimoulin/adaptive-depth-transformers). So you should install the code implementation first:

	```bash
	!pip install git+https://github.com/AntoineSimoulin/adaptive-depth-transformers$
	```

	Then you can use the model directly.

	```python
	from act import AlbertActConfig, AlbertActModel, TFAlbertActModel
	from transformers import AlbertTokenizer

	tokenizer = AlbertTokenizer.from_pretrained('asi/albert-act-base')
	model = AlbertActModel.from_pretrained('asi/albert-act-base')
	_ = model.eval()

	inputs = tokenizer("a lump in the middle of the monkeys stirred and then fell quiet .", return_tensors="pt")
	outputs = model(**inputs)
	outputs.updates
	# tensor([[[[15., 9., 10., 7., 3., 8., 5., 7., 12., 10., 6., 8., 8., 9., 5., 8.]]]])
	```

	## Citations

	### BibTeX entry and citation info

	If you use our iterative transformer model for your scientific publication or your industrial applications, please cite the following [paper](https://aclanthology.org/2021.acl-srw.23/):

	```bibtex
	@inproceedings{simoulin-crabbe-2021-many,
	title = "How Many Layers and Why? {A}n Analysis of the Model Depth in Transformers",
	author = "Simoulin, Antoine and
	Crabb{\'e}, Benoit",
	booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop",
	month = aug,
	year = "2021",
	address = "Online",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2021.acl-srw.23",
	doi = "10.18653/v1/2021.acl-srw.23",
	pages = "221--228",
	}
	```

	### References

	><div id="graves-2016">Alex Graves. 2016. <a href="https://arxiv.org/abs/1603.08983" target="_blank">Adaptive computation time for recurrent neural networks.</a> CoRR, abs/1603.08983.</div>