Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

roberta-php


language: php datasets: - code_search_net

This is a roberta pre-trained version on the CodeSearchNet dataset for php Mask Language Model mission.

To load the model: (necessary packages: !pip install transformers sentencepiece)

from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
tokenizer = AutoTokenizer.from_pretrained("dbernsohn/roberta-php")
model = AutoModelWithLMHead.from_pretrained("dbernsohn/roberta-php")

fill_mask = pipeline(
    "fill-mask",
    model=model,
    tokenizer=tokenizer
)

You can then use this model to fill masked words in a Java code.

code = """
$people = array(
    array('name' => 'Kalle', 'salt' => 856412),
    array('name' => 'Pierre', 'salt' => 215863)
);

for($i = 0; $i < count($<mask>); ++$i) {
    $people[$i]['salt'] = mt_rand(000000, 999999);
}
""".lstrip()

pred = {x["token_str"].replace("ฤ ", ""): x["score"] for x in fill_mask(code)}
sorted(pred.items(), key=lambda kv: kv[1], reverse=True)
# [('people', 0.785636842250824),
#  ('parts', 0.006270722020417452),
#  ('id', 0.0035842324141412973),
#  ('data', 0.0025512021966278553),
#  ('config', 0.002258970635011792)]

The whole training process and hyperparameters are in my GitHub repo

Created by Dor Bernsohn

Downloads last month
8
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using dbernsohn/roberta-php 1