Update README.md
Browse files
README.md
CHANGED
@@ -8,16 +8,44 @@ library_name: transformers
|
|
8 |
tags:
|
9 |
- mergekit
|
10 |
- merge
|
11 |
-
|
12 |
---
|
13 |
# NemoDori-v0.5-12B-MN-BT
|
14 |
|
15 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Merge Details
|
|
|
18 |
### Merge Method
|
19 |
|
20 |
-
This model was merged using the breadcrumbs_ties merge method using [RozGrov/NemoDori-v0.2-12B-MN-BT](https://huggingface.co/RozGrov/NemoDori-v0.2-12B-MN-BT) as a base.
|
21 |
|
22 |
### Models Merged
|
23 |
|
@@ -52,4 +80,4 @@ parameters:
|
|
52 |
gamma: 0.015
|
53 |
dtype: bfloat16
|
54 |
|
55 |
-
```
|
|
|
8 |
tags:
|
9 |
- mergekit
|
10 |
- merge
|
11 |
+
pipeline_tag: text-generation
|
12 |
---
|
13 |
# NemoDori-v0.5-12B-MN-BT
|
14 |
|
15 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
16 |
|
17 |
+
The first child from [NemoDori-v0.2-12B-MN-BT](https://huggingface.co/RozGrov/NemoDori-v0.2-12B-MN-BT).
|
18 |
+
|
19 |
+
**The purpose** is to find a way to increase v0.2 capability to stay **aware of the past conversations** and **follow instructions better**, especially the last one (depth-0),
|
20 |
+
while keeping it's **creativity and capability to (E)RP**.
|
21 |
+
This model is one of the few childs to try to fulfill that.
|
22 |
+
|
23 |
+
In my very short testing so far, I haven't found anything that's different from the parent and worth mentioning. But I think this version is **slightly degraded** somehow,
|
24 |
+
(I don't quite know it, I just felt like it did). Anyway, try it as you may, I think **v0.2** is **better** than this one.
|
25 |
+
|
26 |
+
The other child (**v0.6**) will come soon.
|
27 |
+
I tested it more than this model and it seems to improve the instruction-following part, but the response format is not very consistent in a way somehow.
|
28 |
+
So yeah, just a sneak peak of it... maybe.
|
29 |
+
|
30 |
+
You may give me feedback on how I can fulfill my-*ahem* it's purpose while keeping it as low as not-70B.
|
31 |
+
<br>
|
32 |
+
Fine-tune is... pretty expensive for me, and I'm not ready for that (yet, tho i'm interested).
|
33 |
+
|
34 |
+
<p style="font-size: 11px; margin-top: 11px" id="heya-im-a-bit-of-a-programmer">
|
35 |
+
(listen, between you and me, i still don't get it. still learning this new hobby of mine, and it's kind of refreshing in a way.
|
36 |
+
i'll be exploring more other architectures in the future. Yet, this is about how random i pick my straw, just to see how lucky i am.)
|
37 |
+
<br>
|
38 |
+
(although, i am interested to learn how to make a new merge method.
|
39 |
+
similar to when i'm making a solution for solving specific problem just like good ol days.
|
40 |
+
<span style="color: darkred">but hell, this llm stuff is really expensive.</span>)
|
41 |
+
</p>
|
42 |
+
|
43 |
+
|
44 |
## Merge Details
|
45 |
+
|
46 |
### Merge Method
|
47 |
|
48 |
+
This model was merged using the `breadcrumbs_ties` merge method using [RozGrov/NemoDori-v0.2-12B-MN-BT](https://huggingface.co/RozGrov/NemoDori-v0.2-12B-MN-BT) as a base.
|
49 |
|
50 |
### Models Merged
|
51 |
|
|
|
80 |
gamma: 0.015
|
81 |
dtype: bfloat16
|
82 |
|
83 |
+
```
|