asi commited on
Commit
a202aed
1 Parent(s): ef31535

:books: init doc

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md CHANGED
@@ -1,3 +1,69 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+
6
+
7
+ # Adaptive Depth Transformers
8
+
9
+ Implementation of the paper "How Many Layers and Why? An Analysis of the Model Depth in Transformers". In this study, we investigate the role of the multiple layers in deep transformer models. We design a variant of ALBERT that dynamically adapts the number of layers for each token of the input.
10
+
11
+ ## Model architecture
12
+
13
+ We augment a multi-layer transformer encoder with a halting mechanism, which allows dynamically adjusting the number of layers for each token.
14
+ We directly adapted this mechanism from Graves ([2016](#graves-2016)). At each iteration, we compute a probability for each token to stop updating its state.
15
+
16
+ ## Model use
17
+
18
+ The architecture is not yet directly included in the Transformers library. So you shoud install the code implementation first:
19
+
20
+ ````bash
21
+ pip install git+https://github.com/AntoineSimoulin/adaptive-depth-transformers
22
+ ```
23
+
24
+ Then You can you se model directly
25
+
26
+ ```pyhton
27
+ import sys
28
+ sys.path.append('adaptative-depth-transformers')
29
+
30
+ from modeling_albert_act_tf import TFAlbertActModel
31
+ from modeling_albert_act import AlbertActModel
32
+ from configuration_albert_act import AlbertActConfig
33
+ from transformers import AlbertTokenizer
34
+
35
+ model = AlbertActModel.from_pretrained('asi/albert-act-base/')
36
+ _ = model.eval()
37
+ tokenizer = AlbertTokenizer.from_pretrained('asi/albert-act-base/')
38
+ inputs = tokenizer("a lump in the middle of the monkeys stirred and then fell quiet .", return_tensors="pt")
39
+ outputs = model(**inputs)
40
+ outputs.updates
41
+ # tensor([[[[15., 9., 10., 7., 3., 8., 5., 7., 12., 10., 6., 8., 8., 9., 5., 8.]]]])
42
+ ```
43
+
44
+
45
+ ## Citations
46
+
47
+ ### BibTeX entry and citation info
48
+
49
+ If you use our iterative transformer model for your scientific publication or your industrial applications, please cite the following [paper](https://aclanthology.org/2021.acl-srw.23/):
50
+
51
+ ```bibtex
52
+ @inproceedings{simoulin-crabbe-2021-many,
53
+ title = "How Many Layers and Why? {A}n Analysis of the Model Depth in Transformers",
54
+ author = "Simoulin, Antoine and
55
+ Crabb{\'e}, Benoit",
56
+ booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop",
57
+ month = aug,
58
+ year = "2021",
59
+ address = "Online",
60
+ publisher = "Association for Computational Linguistics",
61
+ url = "https://aclanthology.org/2021.acl-srw.23",
62
+ doi = "10.18653/v1/2021.acl-srw.23",
63
+ pages = "221--228",
64
+ }
65
+ ```
66
+
67
+ ### References
68
+
69
+ ><div id="graves-2016">Alex Graves. 2016. Adaptive computation time for recurrent neural networks. CoRR, abs/1603.08983.</div>