Sentdex commited on
Commit
35b3547
1 Parent(s): a94558c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -10
README.md CHANGED
@@ -12,27 +12,41 @@ GPyT is a GPT2 model trained from scratch (not fine tuned) on Python code from G
12
 
13
  Newlines are replaced by `<N>`
14
 
15
-
16
  Input to the model is code, up to the context length of 1024, with newlines replaced by `<N>`
17
 
18
- Here's an example of a quick converter to take your multi-line code and replace the newlines:
19
 
20
  ```py
21
- inp = """def do_something():
22
- print("Hello")
23
- """
 
 
 
 
24
 
25
  newlinechar = "<N>"
26
  converted = inp.replace("\n", newlinechar)
27
- print("length:", len(converted))
28
- print(converted)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ```
30
 
31
- This should give you something like:
32
 
33
- `def do_something():<N> print("Hello")<N>`
34
 
35
- ...which is what the model is expecting as input.
36
 
37
  Considerations:
38
 
 
12
 
13
  Newlines are replaced by `<N>`
14
 
 
15
  Input to the model is code, up to the context length of 1024, with newlines replaced by `<N>`
16
 
17
+ Here's a quick example of using this model:
18
 
19
  ```py
20
+ from transformers import AutoTokenizer, AutoModelWithLMHead
21
+
22
+ tokenizer = AutoTokenizer.from_pretrained("Sentdex/GPyT")
23
+ model = AutoModelWithLMHead.from_pretrained("Sentdex/GPyT")
24
+
25
+ # copy and paste some code in here
26
+ inp = """import"""
27
 
28
  newlinechar = "<N>"
29
  converted = inp.replace("\n", newlinechar)
30
+ tokenized = tokenizer.encode(converted, return_tensors='pt').to("cuda")
31
+ resp = model.generate(tokenized).to("cuda")
32
+
33
+ decoded = tokenizer.decode(resp[0])
34
+ reformatted = decoded.replace("<N>","\n")
35
+
36
+ print(reformatted)
37
+ ```
38
+
39
+ Should produce:
40
+
41
+ ```
42
+ import numpy as np
43
+ import pytest
44
+
45
+ import pandas as pd<N
46
  ```
47
 
 
48
 
 
49
 
 
50
 
51
  Considerations:
52