johannhartmann commited on
Commit
ef5b2d3
1 Parent(s): 15240b6

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,18 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ llama3-discoleo-instruct-8b-v0.1.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ llama3-discoleo-instruct-8b-v0.1.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
38
+ llama3-discoleo-instruct-8b-v0.1.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
+ llama3-discoleo-instruct-8b-v0.1.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
40
+ llama3-discoleo-instruct-8b-v0.1.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
41
+ llama3-discoleo-instruct-8b-v0.1.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
42
+ llama3-discoleo-instruct-8b-v0.1.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
43
+ llama3-discoleo-instruct-8b-v0.1.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
44
+ llama3-discoleo-instruct-8b-v0.1.Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
45
+ llama3-discoleo-instruct-8b-v0.1.Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
46
+ llama3-discoleo-instruct-8b-v0.1.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
47
+ llama3-discoleo-instruct-8b-v0.1.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
48
+ llama3-discoleo-instruct-8b-v0.1.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
49
+ llama3-discoleo-instruct-8b-v0.1.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
50
+ llama3-discoleo-instruct-8b-v0.1.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ license: llama3
5
+ library_name: transformers
6
+ tags:
7
+ - gguf
8
+ ---
9
+ # Llama3-DiscoLeo-Instruct 8B (version 0.1)
10
+
11
+ ## Thanks and Accreditation
12
+
13
+ [DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
14
+ is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
15
+ with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
16
+ Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.
17
+
18
+ ## Model Overview
19
+
20
+ Llama3_DiscoLeo_Instruct_8B_v0 is an instruction tuned version of our [Llama3-German-8B](https://huggingface.co/DiscoResearch/Llama3_German_8B).
21
+ The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
22
+ We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).
23
+
24
+
25
+ ## How to use
26
+ Llama3_DiscoLeo_Instruct_8B_v0.1 uses the [Llama-3 chat template](https://github.com/meta-llama/llama3?tab=readme-ov-file#instruction-tuned-models), which can be easily used with [transformer's chat templating](https://huggingface.co/docs/transformers/main/en/chat_templating).
27
+ See [below](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1#usage-example) for a usage example.
28
+
29
+ ## Model Training and Hyperparameters
30
+ The model was full-fintuned with axolotl on the [hessian.Ai 42](hessian.ai) with 8192 context-length, learning rate 2e-5 and batch size of 16.
31
+
32
+
33
+ ## Evaluation and Results
34
+
35
+ We evaluated the model using a suite of common English Benchmarks and their German counterparts with [GermanBench](https://github.com/bjoernpl/GermanBenchmark).
36
+
37
+ In the below image and corresponding table, you can see the benchmark scores for the different instruct models compared to Metas instruct version. All checkpoints are available in this [collection](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729).
38
+
39
+ ![instruct scores](instruct_model_benchmarks.png)
40
+
41
+ | Model | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag | hellaswag_de | MMLU | MMLU-DE | mean |
42
+ |----------------------------------------------------|----------------|---------------|---------------|------------------|-------------|--------------|-------------|-------------|-------------|
43
+ | meta-llama/Meta-Llama-3-8B-Instruct | 0.47498 | 0.43923 | **0.59642** | 0.47952 | **0.82025** | 0.60008 | **0.66658** | 0.53541 | 0.57656 |
44
+ | DiscoResearch/Llama3-German-8B | 0.49499 | 0.44838 | 0.55802 | 0.49829 | 0.79924 | 0.65395 | 0.62240 | 0.54413 | 0.57743 |
45
+ | DiscoResearch/Llama3-German-8B-32k | 0.48920 | 0.45138 | 0.54437 | 0.49232 | 0.79078 | 0.64310 | 0.58774 | 0.47971 | 0.55982 |
46
+ | **DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1** | **0.53042** | 0.52867 | 0.59556 | **0.53839** | 0.80721 | 0.66440 | 0.61898 | 0.56053 | **0.60552** |
47
+ | DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1| 0.52749 | **0.53245** | 0.58788 | 0.53754 | 0.80770 | **0.66709** | 0.62123 | **0.56238** | 0.60547 |
48
+
49
+ ## Model Configurations
50
+
51
+ We release DiscoLeo-8B in the following configurations:
52
+ 1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3_German_8B)
53
+ 2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
54
+ 3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1) (This model)
55
+ 4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1)
56
+ 5. [Experimental `DARE-TIES` Merge with Llama3-Instruct](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_8B_DARE_Experimental)
57
+ 6. [Collection of Quantized versions](https://huggingface.co/collections/DiscoResearch/discoleo-8b-quants-6651bcf8f72c9a37ce485d42)
58
+
59
+ ## Usage Example
60
+ Here's how to use the model with transformers:
61
+ ```python
62
+ from transformers import AutoModelForCausalLM, AutoTokenizer
63
+
64
+ model = AutoModelForCausalLM.from_pretrained(
65
+ "DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1",
66
+ torch_dtype="auto",
67
+ device_map="auto"
68
+ )
69
+ tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1")
70
+
71
+ prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
72
+ messages = [
73
+ {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
74
+ {"role": "user", "content": prompt}
75
+ ]
76
+ text = tokenizer.apply_chat_template(
77
+ messages,
78
+ tokenize=False,
79
+ add_generation_prompt=True
80
+ )
81
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
82
+
83
+ generated_ids = model.generate(
84
+ model_inputs.input_ids,
85
+ max_new_tokens=512
86
+ )
87
+ generated_ids = [
88
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
89
+ ]
90
+
91
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
92
+ ```
93
+
94
+ ## Acknowledgements
95
+
96
+ The model was trained and evaluated by [Björn Plüster](https://huggingface.co/bjoernp) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)) with data preparation and project supervision by [Manuel Brack](http://manuel-brack.eu) ([DFKI](https://www.dfki.de/web/), [TU-Darmstadt](https://www.tu-darmstadt.de/)). Initial work on dataset collection and curation was performed by [Malte Ostendorff](https://ostendorff.org) and [Pedro Ortiz Suarez](https://portizs.eu). Instruction tuning was done with the DiscoLM German dataset created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)). We extend our gratitude to [LAION](https://laion.ai/) and friends, especially [Christoph Schuhmann](https://entwickler.de/experten/christoph-schuhmann) and [Jenia Jitsev](https://huggingface.co/JJitsev), for initiating this collaboration.
97
+
98
+ The model training was supported by a compute grant at the [42 supercomputer](https://hessian.ai/) which is a central component in the development of [hessian AI](https://hessian.ai/), the [AI Innovation Lab](https://hessian.ai/infrastructure/ai-innovationlab/) (funded by the [Hessian Ministry of Higher Education, Research and the Art (HMWK)](https://wissenschaft.hessen.de) & the [Hessian Ministry of the Interior, for Security and Homeland Security (HMinD)](https://innen.hessen.de)) and the [AI Service Centers](https://hessian.ai/infrastructure/ai-service-centre/) (funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)).
99
+ The curation of the training data is partially funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)
100
+ through the project [OpenGPT-X](https://opengpt-x.de/en/) (project no. 68GX21007D).
llama3-discoleo-instruct-8b-v0.1.Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ceb6593a599ddc0f1391858d4eb9ac1e9c016cfa4738c18f1d76962b668b40e1
3
+ size 3179131168
llama3-discoleo-instruct-8b-v0.1.Q3_K_L.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a47a4ba3fafefe9e267bf6875d07e04f095a5ca424089af7fcf77e96a36db31a
3
+ size 4321956128
llama3-discoleo-instruct-8b-v0.1.Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94188a0e152fbc60c3b3c2e6479a1e7712e64010f7125d5cdec070144fc83f05
3
+ size 4018917664
llama3-discoleo-instruct-8b-v0.1.Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a634d4222f65ee1946efe41412989c94a05ca8a956b81ff1ce034e5bac9ddfd
3
+ size 3664498976
llama3-discoleo-instruct-8b-v0.1.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4688caf9f4c83af3727f8f003f2cf82cd6ee074e7d2f23215e63d2294d2aab83
3
+ size 4661211424
llama3-discoleo-instruct-8b-v0.1.Q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2d9f4945b42ef243cc86d786ee4306c0765f174b16339629f830a73a2898742
3
+ size 5130252576
llama3-discoleo-instruct-8b-v0.1.Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffad4ac51e8342a6f941d1ecc0790f8b3daaa89e1c88026b3c34359502fe506b
3
+ size 4920733984
llama3-discoleo-instruct-8b-v0.1.Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:666c7c52554d1334b7ebddbde817561f1012f42be0022ec114d8973be48b6116
3
+ size 4692668704
llama3-discoleo-instruct-8b-v0.1.Q5_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c48f0ee047230027f478cdf6cbe8e996d6c8f118cf7ed18f8351995eb75372ce
3
+ size 5599293728
llama3-discoleo-instruct-8b-v0.1.Q5_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:038d2b966b15e838a8890c6c33446424da0f92a837eef9dc6d1115bb6fdb8511
3
+ size 6068334880
llama3-discoleo-instruct-8b-v0.1.Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60b1950625c94a01f86fea587bde6e658d50cd4cc39caa99affe8d2b4d8efb4c
3
+ size 5732987168
llama3-discoleo-instruct-8b-v0.1.Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:050a4a1cca4e2f001b94cc9fbec972d293751bdd3208821b0bf4fac59a85e954
3
+ size 5599293728
llama3-discoleo-instruct-8b-v0.1.Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d75f723d2d859289e3a33145ffc8bdde5b83dc6e9e68fb8c11f4b829e0f564e5
3
+ size 6596006176
llama3-discoleo-instruct-8b-v0.1.Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9267882b217dd1590ed20f731c9411a35ba6fe58b74b85b08bc61cd0d50416b
3
+ size 8540770592
llama3-discoleo-instruct-8b-v0.1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afdaa004fc33a29074411888f259adc8e64f3ef4834f91f9d10a7d6c7b29e840
3
+ size 16068890880