JavaBERT

A BERT-like model pretrained on Java software code.

Training Data

The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A bert-base-cased tokenizer is used by this model.

Training Objective

A MLM (Masked Language Model) objective was used to train this model.

Usage

from transformers import pipeline
pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
output = pipe(CODE) # Replace with Java code; Use '[MASK]' to mask tokens/words in the code.

Related Model

A version of this model using an uncased tokenizer is available at CAUKiel/JavaBERT-uncased.

Downloads last month
35
Hosted inference API
Fill-Mask
Examples
Examples
Mask token: [MASK]
This model can be loaded on the Inference API on-demand.