GGUF
Inference Endpoints
draganjovanovich commited on
Commit
9b63a5f
โ€ข
1 Parent(s): 1887ca1

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ prodigy-sm-base-v0.1-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ prodigy-sm-base-v0.1-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ prodigy-sm-base-v0.1-Q8_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
+ prodigy-sm-base-v0.1.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - sr
6
+ - hr
7
+ - bs
8
+ ---
9
+ # Prodigy SM Base v0.1
10
+
11
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/617bbeec14572ebe9e6ea83f/4p2zaOWu6kTS3fcbevHef.png" width="70%" height="70%">
12
+
13
+ In our latest endeavour, we performed continued pre-training of a large language model (Mistral-7b-v0.1) to understand and generate text in new languages, including **Serbian**, **Bosnian** and **Croatian** using an innovative approach.
14
+
15
+ Rather than depending only on extensive datasets in the target language, our method utilizes a more compact set of both synthetic and human-curated data along with some mixture of CC Web data, which is implemented in two strategic phases:
16
+
17
+ 1. Establishing a comprehensive demonstration of all grammatical and orthographic rules pertinent to the language.
18
+ 2. Supplying a diverse array of examples that not only reinforce these rules but also integrate a wide range of linguistic nuances.
19
+
20
+ While our approach is uniquely tailored to our objectives, we have drawn some inspiration from recent advancements in language model training. Specifically, the conceptual strategies discussed in the paper [ADAPTING LARGE LANGUAGE MODELS VIA READING COMPREHENSION](https://arxiv.org/pdf/2309.09530.pdf) provided valuable insights, though our methods diverge significantly in practice. By adopting this inspired approach, we aim to efficiently teach the model new languages with a balanced blend of accuracy and linguistic diversity.
21
+
22
+ So... Did it work?!
23
+
24
+ # **Yes!**
25
+ See the benchmark results, or even better, download the model and try it yourself. As you know by now, there's no better benchmark than a quick 'try it yourself' vibe check. :)
26
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/617bbeec14572ebe9e6ea83f/C9m_OjnYEpQo43VCrwz4A.png" width="100%" height="100%">
27
+
28
+ Here, we demonstrate results of benchmark that is not frequently performed, yet equally important: how adapting the model for a new language impacted its original English-only performance.
29
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/617bbeec14572ebe9e6ea83f/IPY0myfQI-Ne5x6b11glz.png" width="100%" height="100%">
30
+
31
+ *All evals are performed in zero shot manner.
32
+ *Also bear in mind that llama-2-7b, llama-3-8b and mistral-7b models compared to Prodigy SM base aren't trained on extensive Serbian language datasets, and these benchmarks demonstrate that primarily English models can be adapted to other languages.
33
+
34
+ So, as you can see, we successfully improved the original model's performance for Serbian language use cases while retaining or even slightly improving its performance for English language.
35
+
36
+ ### Training results
37
+ Training results of continued pre-training of [mistral-7b-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
38
+
39
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/617bbeec14572ebe9e6ea83f/5xeJ-vfWk4RhJNC7t5I0g.png" width="70%" height="70%">
40
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/617bbeec14572ebe9e6ea83f/R4R8ai8LaN3WlYCOenUyb.png" width="70%" height="70%">
41
+
42
+ As last experimental step we merged produced model with **Mistral-7B-v0.1** and two earlier checkpoints from **prodigy-sm-base** using [Model Stock](https://arxiv.org/abs/2403.19522) method.
43
+
44
+ # Notes
45
+ As this is base model, there is no chat template or strict chat following capabilities, this model is best candidate for further pre-train on Serbian language as there is a lot more room for improvement (you can hit sweet spot), or next step in the pipeline, such as some form of chat or instruct tuning.
46
+
47
+ If you want model that is already instruction tuned we did that too, check **Prodigy SM Instruct v0.1**
48
+ # Prodigy SM Instruct v0.1
49
+ ๐Ÿš€[prodigy-sm-instruct]() **COMING SOON**
50
+
51
+ And stay tuned for:
52
+ [prodigy-sm-base (llama-3)]() **COMING SOON**
53
+ [prodigy-sm-instruct (llama-3)]() **COMING SOON**
54
+
55
+ ๐Ÿ“ข Also we are excited to announce that [iskon.ai](https://Iskon.ai) will soon launch an API platform featuring advanced **Prodigy** series of models, advanced AI tools and much more! ๐Ÿš€
56
+
57
+
58
+ # Thanks
59
+ - [gordicaleksa/serbian-llm-eval](https://github.com/gordicaleksa/serbian-llm-eval) and his community for curating translations and adaptation of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
60
+ that we used to perform benchmarks.
61
+ - [jondurbin](https://huggingface.co/jondurbin) for amazing airoboros framework
62
+ - [teknium](https://huggingface.co/teknium) for various insights shared on discord and twitter aka x.com
63
+ - [Eric](https://twitter.com/erhartford) for various insights shared on discord and twitter aka x.com
64
+ - [mergekit](https://github.com/arcee-ai/mergekit) for model merging tools
65
+
66
+ *Huge thanks to Redmond.ai for generous DGX cloud credits* [redmond.ai]( https://redmond.ai)
prodigy-sm-base-v0.1-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1986511afe78b7087fb7a185bc947926b5a0dee84019d1e9f8c513c07439346
3
+ size 4368439008
prodigy-sm-base-v0.1-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8340e398b03c37f5822c014d006ce0501e20b4d96fbefffe3a4a6aec46dce75e
3
+ size 5131409120
prodigy-sm-base-v0.1-Q8_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f986dae1498f242b7b569d5cf1327be826d1217a681e8f7dd2d9c64c005d614f
3
+ size 7695857376
prodigy-sm-base-v0.1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eddb08d54eb92cc91cf075d0edf837b167dd8afa9d1d86c5f8456bd466f592c7
3
+ size 14484731584