Update README.md
Browse files
README.md
CHANGED
@@ -101,4 +101,19 @@ User: 「今天天氣真好」は日本語で何ですか
|
|
101 |
|
102 |
Response:
|
103 |
「今天天氣真好」は、日本語で「今日の天気が良好だ」と言われています。
|
104 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
101 |
|
102 |
Response:
|
103 |
「今天天氣真好」は、日本語で「今日の天気が良好だ」と言われています。
|
104 |
+
```
|
105 |
+
|
106 |
+
|
107 |
+
## Some more information
|
108 |
+
|
109 |
+
### Why use lora+embed+head
|
110 |
+
First, I think it is obvious that when a LLM isn't good at some language and you want to ft for it. You should train the embed and head part.<br>
|
111 |
+
But the question is: "Why not just native finetune?"<br>
|
112 |
+
If you have searched for some alpaca model or training thing, you may notice that lot of them has 1 problem: "memorize".<br>
|
113 |
+
The loss will drop at the begin of every epoch, just like some kind of "overfit".<br>
|
114 |
+
And in my opinion, this is because that the number of params of LLaMA is too large. So it just memorize all the training data.
|
115 |
+
|
116 |
+
But if I use lora for attention part(ignore MLP part), the param number is not large enough for "memorizing training data", so it is more unlikely to memorize all the things.
|
117 |
+
|
118 |
+
And here is the loss graph of this 2epoch model:
|
119 |
+
![Image](https://i.imgur.com/Z1ilyCm.png)
|