nazneen commited on
Commit
f049c36
1 Parent(s): 5028efb

model documentation

Browse files
Files changed (1) hide show
  1. README.md +158 -13
README.md CHANGED
@@ -1,21 +1,166 @@
1
- ---
2
- language:
3
- - code
4
- license: apache-2.0
5
- widget:
6
- - text: 'public [MASK] isOdd(Integer num) {if (num % 2 == 0) {return "even";} else {return "odd";}}'
7
- ---
8
- ## JavaBERT
9
  A BERT-like model pretrained on Java software code.
10
- ### Training Data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A ```bert-base-cased``` tokenizer is used by this model.
 
 
 
 
12
  ### Training Objective
13
  A MLM (Masked Language Model) objective was used to train this model.
14
- ### Usage
15
- ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  from transformers import pipeline
17
  pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
18
  output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code.
19
  ```
20
- #### Related Model
21
- A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT-uncased](https://huggingface.co/CAUKiel/JavaBERT-uncased).
 
1
+ # Model Card for JavaBERT
2
+
 
 
 
 
 
 
3
  A BERT-like model pretrained on Java software code.
4
+
5
+
6
+
7
+
8
+
9
+
10
+ # Model Details
11
+
12
+ ## Model Description
13
+
14
+ A BERT-like model pretrained on Java software code.
15
+
16
+ - **Developed by:** Christian-Albrechts-University of Kiel (CAUKiel)
17
+ - **Shared by [Optional]:** Hugging Face
18
+ - **Model type:** Fill-Mask
19
+ - **Language(s) (NLP):** en
20
+ - **License:** Apache-2.0
21
+ - **Related Models:** A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT-uncased](https://huggingface.co/CAUKiel/JavaBERT-uncased).
22
+ - **Parent Model:** BERT
23
+ - **Resources for more information:**
24
+ - [Associated Paper](https://arxiv.org/pdf/2110.10404.pdf)
25
+
26
+
27
+ # Uses
28
+
29
+ ## Direct Use
30
+
31
+ Fill-Mask
32
+
33
+ ## Downstream Use [Optional]
34
+
35
+ More information needed.
36
+
37
+ ## Out-of-Scope Use
38
+
39
+ The model should not be used to intentionally create hostile or alienating environments for people.
40
+
41
+ # Bias, Risks, and Limitations
42
+
43
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
44
+
45
+
46
+ ## Recommendations
47
+
48
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
49
+ { see paper= word something)
50
+
51
+ # Training Details
52
+
53
+ ## Training Data
54
  The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A ```bert-base-cased``` tokenizer is used by this model.
55
+
56
+ ## Training Procedure
57
+
58
+
59
  ### Training Objective
60
  A MLM (Masked Language Model) objective was used to train this model.
61
+
62
+ ### Preprocessing
63
+
64
+ More information needed.
65
+
66
+
67
+ ### Speeds, Sizes, Times
68
+
69
+ More information needed.
70
+
71
+ # Evaluation
72
+
73
+
74
+
75
+ ## Testing Data, Factors & Metrics
76
+
77
+ ### Testing Data
78
+ More information needed.
79
+
80
+
81
+ ### Factors
82
+
83
+
84
+
85
+ ### Metrics
86
+
87
+ More information needed.
88
+
89
+
90
+ ## Results
91
+ More information needed.
92
+
93
+
94
+ # Model Examination
95
+
96
+ More information needed.
97
+
98
+ # Environmental Impact
99
+
100
+
101
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
102
+
103
+ - **Hardware Type:** More information needed.
104
+ - **Hours used:** More information needed.
105
+ - **Cloud Provider:** More information needed.
106
+ - **Compute Region:** More information needed.
107
+ - **Carbon Emitted:** More information needed.
108
+
109
+ # Technical Specifications [optional]
110
+
111
+ ## Model Architecture and Objective
112
+
113
+ More information needed.
114
+
115
+ ## Compute Infrastructure
116
+
117
+ More information needed.
118
+
119
+ ### Hardware
120
+
121
+ More information needed.
122
+
123
+ ### Software
124
+
125
+ More information needed.
126
+
127
+ # Citation
128
+
129
+
130
+
131
+ **BibTeX:**
132
+
133
+ More information needed.
134
+
135
+ **APA:**
136
+
137
+ More information needed.
138
+
139
+ # Glossary [optional]
140
+ More information needed.
141
+
142
+ # More Information [optional]
143
+
144
+ More information needed.
145
+
146
+ # Model Card Authors [optional]
147
+
148
+ Christian-Albrechts-University of Kiel (CAUKiel) in collaboration with Ezi Ozoani and the team at Hugging Face
149
+
150
+ # Model Card Contact
151
+
152
+ More information needed.
153
+
154
+ # How to Get Started with the Model
155
+
156
+ Use the code below to get started with the model.
157
+
158
+ <details>
159
+ <summary> Click to expand </summary>
160
+ ```python
161
  from transformers import pipeline
162
  pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
163
  output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code.
164
  ```
165
+
166
+ </details>