dbernsohn commited on
Commit
aea374d
1 Parent(s): a093673

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -1
README.md CHANGED
@@ -1,4 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ```
 
 
 
 
2
  code = """
3
  new_dict = {}
4
  for k, v in my_dict.<mask>():
@@ -7,4 +32,13 @@ for k, v in my_dict.<mask>():
7
 
8
  pred = {x["token_str"].replace("Ġ", ""): x["score"] for x in fill_mask(code)}
9
  sorted(pred.items(), key=lambda kv: kv[1], reverse=True)
10
- ```
 
 
 
 
 
 
 
 
 
1
+ # roberta-python
2
+ ---
3
+ language: python
4
+ datasets:
5
+ - CodeSearchNet
6
+ ---
7
+
8
+ This is a [roberta](https://arxiv.org/pdf/1907.11692.pdf) pre-trained version on the [CodeSearchNet dataset](https://github.com/github/CodeSearchNet) for **Python** Mask Language Model mission.
9
+
10
+ To load the model:
11
+ (necessary packages: !pip install transformers sentencepiece)
12
+ ```python
13
+ from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
14
+ tokenizer = AutoTokenizer.from_pretrained("dbernsohn/roberta-python")
15
+ model = AutoModelWithLMHead.from_pretrained("dbernsohn/roberta-python")
16
+
17
+ fill_mask = pipeline(
18
+ "fill-mask",
19
+ model=model,
20
+ tokenizer=tokenizer
21
+ )
22
  ```
23
+
24
+ You can then use this model to fill masked words in a Pytho code.
25
+
26
+ ```python
27
  code = """
28
  new_dict = {}
29
  for k, v in my_dict.<mask>():
32
 
33
  pred = {x["token_str"].replace("Ġ", ""): x["score"] for x in fill_mask(code)}
34
  sorted(pred.items(), key=lambda kv: kv[1], reverse=True)
35
+ # [('items', 0.7376779913902283),
36
+ # ('keys', 0.16238391399383545),
37
+ # ('values', 0.03965481370687485),
38
+ # ('iteritems', 0.03346433863043785),
39
+ # ('splitlines', 0.0032723243348300457)]
40
+ ```
41
+
42
+ The whole training process and hyperparameters are in my [GitHub repo](https://github.com/DorBernsohn/CodeLM/tree/main/CodeMLM)
43
+
44
+ > Created by [Dor Bernsohn](https://www.linkedin.com/in/dor-bernsohn-70b2b1146/)