Haiyang-W
/

TokenFormer-900M

Model card Files Files and versions Community

Haiyang-W commited on 23 days ago

Commit

bbf155c

•

1 Parent(s): e25580d

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ license: apache-2.0
 The *TokenFormer* is a **fully attention-based architecture**
 that unifies the computations of token-token and token-parameter interactions
-by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://github.com/Haiyang-W/TokenFormer).
 It contains four models of sizes
 150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
 All 4 model sizes are trained on the exact
@@ -19,7 +19,7 @@ same data, in the exact same order.
 - Language: English
 - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
  for training procedure, config files, and details on how to use.
- [See paper](https://github.com/Haiyang-W/TokenFormer) for more evals and implementation
  details.
 - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
 - License: Apache 2.0
@@ -68,7 +68,7 @@ TokenFormer uses the same tokenizer as [GPT-NeoX-
 ## Evaluations
-All 16 *TokenFormer* models were evaluated using the [LM Evaluation
 Harness](https://github.com/EleutherAI/lm-evaluation-harness).
 You can run the evaluation with our [instruction](https://github.com/Haiyang-W/TokenFormer?tab=readme-ov-file#evaluations).<br>
 Expand the sections below to see plots of evaluation results for all

 The *TokenFormer* is a **fully attention-based architecture**
 that unifies the computations of token-token and token-parameter interactions
+by entirely employing the attention mechanism, **maximizes the flexibility of neural network**.[(see paper)](https://arxiv.org/pdf/2410.23168).
 It contains four models of sizes
 150M, 450M, 900M, 1.5B. For each size, it's trained based on [gpt-neox](https://github.com/EleutherAI/gpt-neox) code base and uses [Pile](https://huggingface.co/datasets/EleutherAI/pile) with 300B tokens.
 All 4 model sizes are trained on the exact
 - Language: English
 - Learn more: [TokenFormer's GitHub repository](https://github.com/Haiyang-W/TokenFormer)
  for training procedure, config files, and details on how to use.
+ [See paper](https://arxiv.org/pdf/2410.23168) for more evals and implementation
  details.
 - Library: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
 - License: Apache 2.0
 ## Evaluations
+All *TokenFormer* models were evaluated using the [LM Evaluation
 Harness](https://github.com/EleutherAI/lm-evaluation-harness).
 You can run the evaluation with our [instruction](https://github.com/Haiyang-W/TokenFormer?tab=readme-ov-file#evaluations).<br>
 Expand the sections below to see plots of evaluation results for all