RmZeta commited on
Commit
19ca588
1 Parent(s): 86728c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # TinyLlama-NoPE-1.1B
6
+
7
+ NoPE is a transformer model without positional encoding.
8
+
9
+ The model is trained following TinyLlama code base (https://github.com/jzhang38/TinyLlama)
10
+
11
+ ## Usage
12
+
13
+ ```python
14
+ from transformers import AutoModelForCausalLM, AutoTokenizer
15
+ from transformers.models.llama import modeling_llama
16
+
17
+
18
+ def nope_monkey_patch(q, k, cos, sin, position_ids, unsqueeze_dim=1):
19
+ return q, k
20
+
21
+
22
+ modeling_llama.apply_rotary_pos_emb = nope_monkey_patch
23
+
24
+ model_path = "AntNLP/TinyLlama-NoPE-1.1B"
25
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
26
+ model = AutoModelForCausalLM.from_pretrained(model_path).cuda()
27
+
28
+ input_ids = tokenizer("Hello, TinyLlama-NoPE", return_tensors="pt").input_ids.cuda()
29
+ output = model.generate(input_ids, do_sample=True, max_length=50)
30
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
31
+ ```
32
+
33
+ ## Citation
34
+
35
+ ```
36
+ @misc{wang2024length,
37
+ title={Length Generalization of Causal Transformers without Position Encoding},
38
+ author={Jie Wang and Tao Ji and Yuanbin Wu and Hang Yan and Tao Gui and Qi Zhang and Xuanjing Huang and Xiaoling Wang},
39
+ year={2024},
40
+ eprint={2404.12224},
41
+ archivePrefix={arXiv},
42
+ primaryClass={cs.CL}
43
+ }
44
+ ```