WANGNingroci commited on
Commit
e8c654a
1 Parent(s): 615864f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -3
README.md CHANGED
@@ -1,3 +1,93 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - WANGNingroci/opencores
5
+ - WANGNingroci/vgen_cpp
6
+ language:
7
+ - en
8
+ pipeline_tag: reinforcement-learning
9
+ tags:
10
+ - code
11
+ ---
12
+
13
+
14
+ # Veriseek
15
+
16
+ Official implement of paper "Large Language Model for Verilog Generation with Golden Code Feedback".
17
+
18
+
19
+ ## Project Description
20
+ This study introduces a novel approach utilizing reinforcement learning with golden code feedback to enhance the performance of pre-trained models. Leveraging open-source data and base models, we have achieved state- of-the-art (SOTA) results with a substantial margin. Notably, our 6.7B parameter model VeriSeek demonstrates superior performance compared to current best-in- class 13B and 16B models. Furthermore, through a comprehensive analysis of the limitations in direct fine-tuning and the training dynamics of reinforcement learning, we posit that the development of comprehensive supervisory signals, which are align with the inherent parallel semantics of Verilog code, is critical to effective generation.
21
+
22
+
23
+ ## Usage
24
+
25
+ Before training, you should configure the `src/scripts/single_node.sh` file for training, `src/configs/accelerate/accel.yaml` for accelerate, `src/configs/deepspeed/ds_z3_config.json` for deepspeed. The configuration file contains the following options:
26
+
27
+ ### Generation
28
+
29
+ To generate reponse, run the following demo:
30
+
31
+ ```python
32
+ import torch
33
+ from transformers import AutoTokenizer, AutoModelForCausalLM
34
+
35
+ def stop_at_stop_token(decoded_string, stop_tokens):
36
+ """
37
+ Produces the prefix of decoded_string that ends at the first occurrence of
38
+ a stop_token.
39
+ WARNING: the decoded_string *must not* include the prompt, which may have stop tokens
40
+ itself.
41
+ """
42
+ min_stop_index = len(decoded_string)
43
+ for stop_token in stop_tokens:
44
+ stop_index = decoded_string.find(stop_token)
45
+ if stop_index != -1 and stop_index < min_stop_index:
46
+ min_stop_index = stop_index
47
+ return decoded_string[:min_stop_index]
48
+
49
+
50
+ prompt = """
51
+ Please act as a professional verilog designer.
52
+
53
+ Implement a data width conversion circuit that converts 8-bit data input to 16-bit data output. The module provides two output ports: valid_out, which indicates the validity of the output data, and data_out, which represents the converted 16-bit output data. The first arriving 8-bit data should be placed in the higher 8 bits of the 16-bit data output. The valid_out and data_out signals are generated in the next clock cycle after the two data inputs. When there is only one data input, valid_out and data_out are not generated immediately. Instead, they wait for the arrival of the next data input to complete the concatenation of the two data inputs before generating valid_out and data_out.
54
+
55
+ Module name:
56
+ width_8to16
57
+
58
+ Input ports:
59
+ clk: Clock signal used for synchronous operation.
60
+ rst_n: Active-low reset signal. Defined as 0 for reset and 1 for reset signal inactive.
61
+ valid_in: Input signal indicating the validity of the input data.
62
+ data_in: 8-bit input data to be converted.
63
+
64
+ Output ports:
65
+ valid_out: Output signal indicating the validity of the output data.
66
+ data_out: 16-bit output data resulting from the width conversion.
67
+ Implementation:
68
+ The data_out register is triggered on the positive edge of the clock signal (posedge clk) or the negative edge of the reset signal (negedge rst_n).
69
+ On reset, indicated by !rst_n, the data_out register is cleared to 0.
70
+ If the input data is valid (valid_in) and the flag signal is active, the data_out register is updated by concatenating the contents of the data_lock register (8 bits) and the data_in register (8 bits) to form a 16-bit output. The first valid data is temporarily stored, and when the second valid data is inputted, they are concatenated to produce the output valid_out and data_out.
71
+
72
+ Give me the complete code.
73
+
74
+ """
75
+
76
+ gpu_id = 0
77
+ model_name = "WANGNingroci/VeriSeek"
78
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
79
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map=gpu_id)
80
+ model.eval()
81
+ # Sample
82
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(gpu_id)
83
+ sample = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95, do_sample=True)
84
+ output = tokenizer.decode(sample[0])
85
+
86
+ stop_word = "endmodule"
87
+ output_trunc = stop_at_stop_token(output, [stop_word])
88
+ output_trunc += "endmodule"
89
+ print(output_trunc)
90
+ ```
91
+
92
+ ## Acknowledgement
93
+ This repo benefits from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). Thanks for their wonderful works.