JavaBERT / README.md
neltds's picture
model documentation (#2)
3b2ebe8
---
language:
- code
license: apache-2.0
widget:
- text: public [MASK] isOdd(Integer num) {if (num % 2 == 0) {return "even";} else
{return "odd";}}
---
# Model Card for JavaBERT
A BERT-like model pretrained on Java software code.
# Model Details
## Model Description
A BERT-like model pretrained on Java software code.
- **Developed by:** Christian-Albrechts-University of Kiel (CAUKiel)
- **Shared by [Optional]:** Hugging Face
- **Model type:** Fill-Mask
- **Language(s) (NLP):** en
- **License:** Apache-2.0
- **Related Models:** A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT-uncased](https://huggingface.co/CAUKiel/JavaBERT-uncased).
- **Parent Model:** BERT
- **Resources for more information:**
- [Associated Paper](https://arxiv.org/pdf/2110.10404.pdf)
# Uses
## Direct Use
Fill-Mask
## Downstream Use [Optional]
More information needed.
## Out-of-Scope Use
The model should not be used to intentionally create hostile or alienating environments for people.
# Bias, Risks, and Limitations
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
## Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
{ see paper= word something)
# Training Details
## Training Data
The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A ```bert-base-cased``` tokenizer is used by this model.
## Training Procedure
### Training Objective
A MLM (Masked Language Model) objective was used to train this model.
### Preprocessing
More information needed.
### Speeds, Sizes, Times
More information needed.
# Evaluation
## Testing Data, Factors & Metrics
### Testing Data
More information needed.
### Factors
### Metrics
More information needed.
## Results
More information needed.
# Model Examination
More information needed.
# Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** More information needed.
- **Hours used:** More information needed.
- **Cloud Provider:** More information needed.
- **Compute Region:** More information needed.
- **Carbon Emitted:** More information needed.
# Technical Specifications [optional]
## Model Architecture and Objective
More information needed.
## Compute Infrastructure
More information needed.
### Hardware
More information needed.
### Software
More information needed.
# Citation
**BibTeX:**
More information needed.
**APA:**
More information needed.
# Glossary [optional]
More information needed.
# More Information [optional]
More information needed.
# Model Card Authors [optional]
Christian-Albrechts-University of Kiel (CAUKiel) in collaboration with Ezi Ozoani and the team at Hugging Face
# Model Card Contact
More information needed.
# How to Get Started with the Model
Use the code below to get started with the model.
<details>
<summary> Click to expand </summary>
```python
from transformers import pipeline
pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code.
```
</details>