# Model Card for JavaBERT
 
A BERT-like model pretrained on Java software code.
 
 
 
 
 
 
# Model Details
 
## Model Description
 
A BERT-like model pretrained on Java software code.
 
- **Developed by:** Christian-Albrechts-University of Kiel (CAUKiel)
- **Shared by [Optional]:** Hugging Face
- **Model type:** Fill-Mask
- **Language(s) (NLP):** en
- **License:** Apache-2.0
- **Related Models:** A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT-uncased](https://huggingface.co/CAUKiel/JavaBERT-uncased).
  - **Parent Model:** BERT
- **Resources for more information:** 
  - [Associated Paper](https://arxiv.org/pdf/2110.10404.pdf)
 
 
# Uses
 
## Direct Use
 
Fill-Mask
 
## Downstream Use [Optional]
 
More information needed.
 
## Out-of-Scope Use
 
The model should not be used to intentionally create hostile or alienating environments for people. 
 
# Bias, Risks, and Limitations
 
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
 
 
## Recommendations
 
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
{ see paper= word something)
 
# Training Details
 
## Training Data
The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A ```bert-base-cased``` tokenizer is used by this model.
 
## Training Procedure
 
 
### Training Objective
A MLM (Masked Language Model) objective was used to train this model.
 
### Preprocessing
 
More information needed.
 
 
### Speeds, Sizes, Times
 
More information needed.
 
# Evaluation
 
 
 
## Testing Data, Factors & Metrics
 
### Testing Data
More information needed.
 
 
### Factors
 

 
### Metrics
 
More information needed.
 
 
## Results 
More information needed.
 
 
# Model Examination
 
More information needed.
 
# Environmental Impact
 
 
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
- **Hardware Type:** More information needed.
- **Hours used:** More information needed.
- **Cloud Provider:** More information needed.
- **Compute Region:** More information needed.
- **Carbon Emitted:** More information needed.
 
# Technical Specifications [optional]
 
## Model Architecture and Objective
 
More information needed.
 
## Compute Infrastructure
 
More information needed.
 
### Hardware
 
More information needed.
 
### Software
 
More information needed.
 
# Citation
 
 
 
**BibTeX:**
 
More information needed.
 
**APA:**
 
More information needed.
 
# Glossary [optional]
More information needed.
 
# More Information [optional]
 
More information needed.
 
# Model Card Authors [optional]
 
Christian-Albrechts-University of Kiel (CAUKiel)  in collaboration with Ezi Ozoani and the team at Hugging Face
 
# Model Card Contact
 
More information needed.
 
# How to Get Started with the Model
 
Use the code below to get started with the model.
 
<details>
<summary> Click to expand </summary>
 ```python
from transformers import pipeline
pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code.
```
 
</details>