iamtarun commited on
Commit
ea08385
1 Parent(s): ce5cd52

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - iamtarun/python_code_instructions_18k_alpaca
4
+ language:
5
+ - en
6
+ metrics:
7
+ - code_eval
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - code
12
+ widget:
13
+ - text: 'def isprime(num):'
14
+ example_title: Code Example 1
15
+ - text: 'def factorial(num):'
16
+ example_title: Code Example 2
17
+ - text: 'def square(num):'
18
+ example_title: Code Example 3
19
+ ---
20
+
21
+ # Competitive Programming LLM for Python Language
22
+
23
+ This model is a finetuned version of [codegen350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on python code [dataset](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) that uses alpaca style prompts while training.
24
+
25
+ ## Prompt function
26
+
27
+ ```python
28
+ '''
29
+ This function generates prompts using the problem description and input.
30
+ @param1 instruction: str - text problem description
31
+ @param2 inputs: str - input to the program
32
+ '''
33
+ def generate_prompt(instruction, inputs=""):
34
+ text = ("Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n"
35
+ "### Instruction:\n"
36
+ f"{instruction}\n\n"
37
+ "### Input:\n"
38
+ f"{inputs}\n\n"
39
+ "### Output:\n")
40
+ return text
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+
48
+ # load model and tokenizer
49
+ model = AutoModelForCausalLM.from_pretrained("iamtarun/codegen-350M-mono-4bit-qlora", device_map="auto")
50
+ tokenizer = AutoTokenizer.from_pretrained("iamtarun/codegen-350M-mono-4bit-qlora")
51
+
52
+ # loading model for inference
53
+ model.eval()
54
+
55
+ # inference function
56
+ '''
57
+ This function takes text prompt as input which is generated from the generate_prompt function and returns the generated response
58
+
59
+ @param1 prompt: str - text prompt generated using generate_prompt function.
60
+ '''
61
+ def pipe(prompt):
62
+ device = "cuda"
63
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
64
+ with torch.no_grad():
65
+ output = model.generate(**inputs,
66
+ max_length=512,
67
+ do_sample=True,
68
+ temperature=0.5,
69
+ top_p=0.95,
70
+ repetition_penalty=1.15)
71
+ return tokenizer.decode(output[0].tolist(),
72
+ skip_special_tokens=True,
73
+ clean_up_tokenization_space=False)
74
+
75
+ # generating code for a problem description
76
+ instruction = "Write a function to calculate square of a number in python"
77
+ inputs = "number = 5"
78
+ prompt = generate_prompt(instruction, inputs)
79
+ print(pipe(prompt))
80
+ print("\n", "="*100)
81
+ ```