neph1 commited on
Commit
86963cf
·
verified ·
1 Parent(s): c8345db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -7,10 +7,59 @@ tags:
7
  - mistral
8
  - trl
9
  license: apache-2.0
 
 
 
10
  language:
11
- - en
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # Uploaded model
15
 
16
  - **Developed by:** neph1
 
7
  - mistral
8
  - trl
9
  license: apache-2.0
10
+ datasets:
11
+ - neph1/bellman-7b-finetune
12
+ - neph1/bellman-multiturn
13
  language:
14
+ - sv
15
  ---
16
 
17
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653cd3049107029eb004f968/pLcriXAfp3Y9Z0RGwwVUB.png)
18
+
19
+ It's finetuned for prompt question answering, based on a dataset created from Swedish wikipedia, with a lot of Sweden-centric questions. New in this version is a multi-turn dataset of about 250 conversations, as well as a number of stories.
20
+
21
+ The name comes from the Swedish bard and poet Carl Mikael Bellman who lived in the 1700s.
22
+ As with any bard, what this model says should be taken with a grain of salt. Even though it has the best of intentions.
23
+
24
+ [![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/T6T3S8VXY)
25
+
26
+ Configuration:
27
+
28
+ Rank: 256
29
+
30
+ Alpha: 512
31
+
32
+ Learning rate (at start): 2e-5
33
+
34
+ Context length: 4096
35
+
36
+ Training length: ca 2 epochs
37
+
38
+ Important. Use correct prompt format for best results: ```[INST]Hur bakar jag en sockerkaka?[/INST]```
39
+
40
+ TrainingArguments(
41
+ per_device_train_batch_size = 6,
42
+ gradient_accumulation_steps = 20,
43
+ num_train_epochs=4,
44
+ warmup_steps = 10,
45
+ learning_rate = 2e-5,
46
+ bf16 = true,
47
+ logging_steps = 5,
48
+ optim = "adamw_8bit",
49
+ weight_decay = 0.01,
50
+ lr_scheduler_type = "linear",
51
+ seed = 3407,
52
+ per_device_eval_batch_size = 6,
53
+ eval_strategy="steps",
54
+ eval_accumulation_steps = 20,
55
+ eval_steps = 5,
56
+ eval_delay = 0,
57
+ save_strategy="steps",
58
+ save_steps=5,
59
+ report_to="none",
60
+ output_dir="",
61
+ )
62
+
63
  # Uploaded model
64
 
65
  - **Developed by:** neph1