File size: 2,505 Bytes
d64e7af
 
 
3f80546
d64e7af
 
 
e0d09e5
d56fa4f
 
d64e7af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dc6a741
d64e7af
 
 
 
 
 
 
 
 
 
 
 
ab58167
d64e7af
ab58167
 
d64e7af
ab58167
 
 
 
 
 
 
 
 
 
 
d64e7af
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: mit
datasets:
- ammarnasr/the-stack-java-clean
library_name: adapter-transformers
tags:
- code
pipeline_tag: text-generation
language:
- code
---


# CodeGen (CodeGen-Mono 350M LoRa Java)

## Model description
CodeGen LoRa Java is a family of autoregressive language models fine-tuned using LoRa on Different Programming Langauges.
## Training data
<!-- https://huggingface.co/datasets/ammarnasr/the-stack-java-clean -->
This model was fine-tuned on the cleaned Java subset from TheStack Avilable [here](https://huggingface.co/datasets/ammarnasr/the-stack-java-clean). The data consists of 1 Million Java code files.

## Training procedure

This model was fine-tuned using LoRa on 1 T4 GPU. The model was trained for 10,000 steps with batch size of 4. The model was trained using causal language modeling loss.

## Evaluation results

We evaluate our models on the MultiPle-E bencchmark. The model achieves 8.9 Pass@10 Rate.
![final_pass_at_10](https://raw.githubusercontent.com/ammarnasr/LLM-for-code-intelligence/main/figs/final_pass_at_10.png


## Intended Use and Limitations

However, the model is intended for and best at **program synthesis**, that is, generating executable code given English prompts, where the prompts should be in the form of a comment string. The model can complete partially-generated code in Java and Python.

## How to use

This model can be easily loaded using the `AutoModelForCausalLM` functionality:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftConfig, PeftModel

model_name = "ammarnasr/codegen-350M-mono-java"
peft_config = PeftConfig.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path)

model = AutoModelForCausalLM.from_pretrained(peft_config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, model_name)

model.print_trainable_parameters()

text = "public static void main(String[] args) {"
input_ids = tokenizer.encode(text, return_tensors="pt")
generated_ids = model.generate(input_ids=input_ids, max_length=100)
print('Generated: \n')
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
```

## BibTeX entry and citation info

```bibtex
@article{Nijkamp2022ACP,
  title={A Conversational Paradigm for Program Synthesis},
  author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming},
  journal={arXiv preprint},
  year={2022}
}
```