neltds nazneen commited on
Commit
3b2ebe8
1 Parent(s): 5028efb

model documentation (#2)

Browse files

- model documentation (4e2edd99993fccdf5928d170acd3ff3d7eb44147)


Co-authored-by: Nazneen Rajani <nazneen@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +165 -8
README.md CHANGED
@@ -1,21 +1,178 @@
 
1
  ---
2
  language:
3
- - code
4
  license: apache-2.0
5
  widget:
6
- - text: 'public [MASK] isOdd(Integer num) {if (num % 2 == 0) {return "even";} else {return "odd";}}'
 
7
  ---
8
- ## JavaBERT
 
 
9
  A BERT-like model pretrained on Java software code.
10
- ### Training Data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A ```bert-base-cased``` tokenizer is used by this model.
 
 
 
 
12
  ### Training Objective
13
  A MLM (Masked Language Model) objective was used to train this model.
14
- ### Usage
15
- ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  from transformers import pipeline
17
  pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
18
  output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code.
19
  ```
20
- #### Related Model
21
- A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT-uncased](https://huggingface.co/CAUKiel/JavaBERT-uncased).
 
1
+
2
  ---
3
  language:
4
+ - code
5
  license: apache-2.0
6
  widget:
7
+ - text: public [MASK] isOdd(Integer num) {if (num % 2 == 0) {return "even";} else
8
+ {return "odd";}}
9
  ---
10
+
11
+ # Model Card for JavaBERT
12
+
13
  A BERT-like model pretrained on Java software code.
14
+
15
+
16
+
17
+
18
+
19
+
20
+ # Model Details
21
+
22
+ ## Model Description
23
+
24
+ A BERT-like model pretrained on Java software code.
25
+
26
+ - **Developed by:** Christian-Albrechts-University of Kiel (CAUKiel)
27
+ - **Shared by [Optional]:** Hugging Face
28
+ - **Model type:** Fill-Mask
29
+ - **Language(s) (NLP):** en
30
+ - **License:** Apache-2.0
31
+ - **Related Models:** A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT-uncased](https://huggingface.co/CAUKiel/JavaBERT-uncased).
32
+ - **Parent Model:** BERT
33
+ - **Resources for more information:**
34
+ - [Associated Paper](https://arxiv.org/pdf/2110.10404.pdf)
35
+
36
+
37
+ # Uses
38
+
39
+ ## Direct Use
40
+
41
+ Fill-Mask
42
+
43
+ ## Downstream Use [Optional]
44
+
45
+ More information needed.
46
+
47
+ ## Out-of-Scope Use
48
+
49
+ The model should not be used to intentionally create hostile or alienating environments for people.
50
+
51
+ # Bias, Risks, and Limitations
52
+
53
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
54
+
55
+
56
+ ## Recommendations
57
+
58
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
59
+ { see paper= word something)
60
+
61
+ # Training Details
62
+
63
+ ## Training Data
64
  The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A ```bert-base-cased``` tokenizer is used by this model.
65
+
66
+ ## Training Procedure
67
+
68
+
69
  ### Training Objective
70
  A MLM (Masked Language Model) objective was used to train this model.
71
+
72
+ ### Preprocessing
73
+
74
+ More information needed.
75
+
76
+
77
+ ### Speeds, Sizes, Times
78
+
79
+ More information needed.
80
+
81
+ # Evaluation
82
+
83
+
84
+
85
+ ## Testing Data, Factors & Metrics
86
+
87
+ ### Testing Data
88
+ More information needed.
89
+
90
+
91
+ ### Factors
92
+
93
+
94
+
95
+ ### Metrics
96
+
97
+ More information needed.
98
+
99
+
100
+ ## Results
101
+ More information needed.
102
+
103
+
104
+ # Model Examination
105
+
106
+ More information needed.
107
+
108
+ # Environmental Impact
109
+
110
+
111
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
112
+
113
+ - **Hardware Type:** More information needed.
114
+ - **Hours used:** More information needed.
115
+ - **Cloud Provider:** More information needed.
116
+ - **Compute Region:** More information needed.
117
+ - **Carbon Emitted:** More information needed.
118
+
119
+ # Technical Specifications [optional]
120
+
121
+ ## Model Architecture and Objective
122
+
123
+ More information needed.
124
+
125
+ ## Compute Infrastructure
126
+
127
+ More information needed.
128
+
129
+ ### Hardware
130
+
131
+ More information needed.
132
+
133
+ ### Software
134
+
135
+ More information needed.
136
+
137
+ # Citation
138
+
139
+
140
+
141
+ **BibTeX:**
142
+
143
+ More information needed.
144
+
145
+ **APA:**
146
+
147
+ More information needed.
148
+
149
+ # Glossary [optional]
150
+ More information needed.
151
+
152
+ # More Information [optional]
153
+
154
+ More information needed.
155
+
156
+ # Model Card Authors [optional]
157
+
158
+ Christian-Albrechts-University of Kiel (CAUKiel) in collaboration with Ezi Ozoani and the team at Hugging Face
159
+
160
+ # Model Card Contact
161
+
162
+ More information needed.
163
+
164
+ # How to Get Started with the Model
165
+
166
+ Use the code below to get started with the model.
167
+
168
+ <details>
169
+ <summary> Click to expand </summary>
170
+
171
+ ```python
172
  from transformers import pipeline
173
  pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
174
  output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code.
175
  ```
176
+
177
+ </details>
178
+