leonardlin commited on
Commit
533f412
1 Parent(s): 0064154

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -7,6 +7,20 @@ model-index:
7
  - name: outputs/lr-8e6
8
  results: []
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
 
7
  - name: outputs/lr-8e6
8
  results: []
9
  ---
10
+ I ran the tests for 2 runs just to try to lower variance. These are all using temp 0.2, min_p 0.1, freq penalty 0.5
11
+
12
+ | Model | AVG Score | ELYZA100 | JA MT-Bench | Rakuda | Tengu-Bench | JA Char % |
13
+ |-----------------------------|-----------|----------|-------------|--------|-------------|-----------|
14
+ | shisa-v1-llama3-8b.lr-2e4 | 3.97 | 4.60 | 4.54 | 3.33 | 3.42 | 92.42% |
15
+ | shisa-v1-llama3-8b.lr-5e5 | 5.73 | 6.28 | 6.45 | 5.37 | 4.81 | 90.93% |
16
+ | shisa-v1-llama3-8b (2e5 avg)| 6.33 | 6.51 | 6.66 | 6.68 | 5.48 | 91.51% |
17
+ | shisa-v1-llama3-8b.8e6 | 6.59 | 6.67 | 6.95 | 7.05 | 5.68 | 91.30% |
18
+ | shisa-v1-llama3-8b.5e6 | 6.42 | 6.33 | 6.76 | 7.15 | 5.45 | 91.56% |
19
+ | shisa-v1-llama3-8b.2e6 | 6.31 | 6.26 | 6.88 | 6.73 | 5.38 | 92.00% |
20
+ * The 2e-4 and 5e-5 are definitely overtrained and perform significantly worse.
21
+ * 2e-5 is on the edge since weightwacher shows the embed as slightly overtrained for 2e-5, but NEFTune version is not
22
+ * 8e-6 performs the best, and 5e-6 also performed slightly better than 2e-5
23
+
24
 
25
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
26
  should probably proofread and complete it, then remove this comment. -->