KBlueLeaf commited on
Commit
16f6966
1 Parent(s): bbafa38

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -101,4 +101,19 @@ User: 「今天天氣真好」は日本語で何ですか
101
 
102
  Response:
103
  「今天天氣真好」は、日本語で「今日の天気が良好だ」と言われています。
104
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
  Response:
103
  「今天天氣真好」は、日本語で「今日の天気が良好だ」と言われています。
104
+ ```
105
+
106
+
107
+ ## Some more information
108
+
109
+ ### Why use lora+embed+head
110
+ First, I think it is obvious that when a LLM isn't good at some language and you want to ft for it. You should train the embed and head part.<br>
111
+ But the question is: "Why not just native finetune?"<br>
112
+ If you have searched for some alpaca model or training thing, you may notice that lot of them has 1 problem: "memorize".<br>
113
+ The loss will drop at the begin of every epoch, just like some kind of "overfit".<br>
114
+ And in my opinion, this is because that the number of params of LLaMA is too large. So it just memorize all the training data.
115
+
116
+ But if I use lora for attention part(ignore MLP part), the param number is not large enough for "memorizing training data", so it is more unlikely to memorize all the things.
117
+
118
+ And here is the loss graph of this 2epoch model:
119
+ ![Image](https://i.imgur.com/Z1ilyCm.png)