awni00 commited on
Commit
ddc2823
·
verified ·
1 Parent(s): 33c1ff7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,9 +1,90 @@
1
  ---
 
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ license: mit
4
+ pipeline_tag: text-generation
5
  tags:
6
  - model_hub_mixin
7
  - pytorch_model_hub_mixin
8
+ dataset: HuggingFaceFW/fineweb-edu
9
  ---
10
 
11
+ # DAT-sa16-ra16-nr128-ns2048-sh16-nkvh8-1.27B
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+ This is a Dual-Attention Transformer Language Model, trained on the `fineweb-edu` dataset. The model is 1B parameters.
16
+
17
+
18
+ ## Model Details
19
+
20
+ | Size | Training Tokens| Layers | Model Dimension | Self-Attention Heads | Relational Attention Heads | Relation Dimension | Context Length |
21
+ |--|--|--|--|--|--|--|--|
22
+ | 1B | 10B | 24| 2048 | 16 | 16 | 128 | 1024 |
23
+
24
+
25
+ ### Model Description
26
+
27
+ - **Developed by:** Awni Altabaa, John Lafferty
28
+ - **Model type:** Decoder-only Dual Attention Transformer
29
+ - **Tokenizer:** GPT-2 BPE tokenizer
30
+ - **Language(s):** English
31
+ <!-- - **License:** MIT -->
32
+ <!-- - **Contact:** awni.altabaa@yale.edu -->
33
+ - **Date:** August, 2024
34
+
35
+ ### Model Sources
36
+
37
+ - **Repository:** https://github.com/Awni00/abstract_transformer
38
+ - **Paper:** [Disentangling and Integrating Relational and Sensory Information in Transformer Architectures](https://arxiv.org/abs/2405.16727)
39
+ - **Huggingface Collection:** [Dual Attention Transformer Collection](https://huggingface.co/collections/awni00/dual-attention-transformer-66c23425a545b0cefe4b9489)
40
+
41
+
42
+ ## Model Usage
43
+
44
+ Use the code below to get started with the model. First, install the `dual-attention` [python package hosted on PyPI](https://pypi.org/project/dual-attention/) via `pip install dual-attention`.
45
+
46
+ To load directly from huggingface hub, use the HFHub wrapper.
47
+ ```
48
+ from dual_attention.hf import DualAttnTransformerLM_HFHub
49
+
50
+ DualAttnTransformerLM_HFHub.from_pretrained('awni00/DAT-sa16-ra16-nr128-ns2048-sh16-nkvh8-1.27B')
51
+ ```
52
+
53
+ ## Training Details
54
+
55
+ The model was trained using the following setup:
56
+ - **Architecture:** Decoder-only Dual Attention Transformer
57
+ - **Framework:** PyTorch
58
+ - **Optimizer:** AdamW
59
+ - **Learning Rate:** 6e-4 (peak)
60
+ - **Weight Decay:** 0.1
61
+ - **Batch Size:** 524,288 Tokens
62
+ - **Sequence Length:** 1024 tokens
63
+ - **Total Training Tokens:** 10B Tokens
64
+
65
+ For more detailed training information, please refer to the paper.
66
+
67
+ ## Evaluation
68
+
69
+ See paper.
70
+
71
+
72
+ ## Model Interpretability Analysis
73
+
74
+ The [DAT-LM-Visualization app](https://huggingface.co/spaces/awni00/DAT-LM-Visualization/) is built to visualize the representations learned in a Dual Attention Transformer language model. It is hosted on Huggingface spaces using their free CPU resources. You can select a pre-trained DAT-LM model, enter a prompt, and visualize the internal representations in different parts of the model. You can also run the app locally (e.g., to use your own GPU) via the PyPI package.
75
+
76
+ Also, see paper.
77
+
78
+ ## Citation
79
+
80
+ ```
81
+ @misc{altabaa2024disentanglingintegratingrelationalsensory,
82
+ title={Disentangling and Integrating Relational and Sensory Information in Transformer Architectures},
83
+ author={Awni Altabaa and John Lafferty},
84
+ year={2024},
85
+ eprint={2405.16727},
86
+ archivePrefix={arXiv},
87
+ primaryClass={cs.LG},
88
+ url={https://arxiv.org/abs/2405.16727},
89
+ }
90
+ ```