cotext-1-ccg / README.md
razent's picture
Create README.md
d2840b5

CoText (1-CCG)

Introduction

Paper: CoTexT: Multi-task Learning with Code-Text Transformer

Authors: Long Phan, Hieu Tran, Daniel Le, Hieu Nguyen, James Anibal, Alec Peltekian, Yanfang Ye

How to use

For more details, do check out our Github repo.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
​
tokenizer = AutoTokenizer.from_pretrained("razent/cotext-1-ccg")  
model = AutoModelForSeq2SeqLM.from_pretrained("razent/cotext-1-ccg")
​
sentence = "def add(a, b): return a + b"
text =  "python: " + sentence + " </s>"

encoding = tokenizer.encode_plus(text, pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    early_stopping=True
)

for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
    print(line)

Citation

@inproceedings{phan-etal-2021-cotext,
    title = "{C}o{T}ex{T}: Multi-task Learning with Code-Text Transformer",
    author = "Phan, Long  and
      Tran, Hieu  and
      Le, Daniel  and
      Nguyen, Hieu  and
      Annibal, James  and
      Peltekian, Alec  and
      Ye, Yanfang",
    booktitle = "Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.nlp4prog-1.5",
    doi = "10.18653/v1/2021.nlp4prog-1.5",
    pages = "40--47"
}