Upload folder using huggingface_hub

Browse files

Files changed (16) hide show

README.md +175 -0
added_tokens.json +7 -0
config.json +27 -0
generation_config.json +6 -0
huggingface-metadata.txt +53 -0
model.safetensors.index.json +0 -0
output-00001-of-00006.safetensors +3 -0
output-00002-of-00006.safetensors +3 -0
output-00003-of-00006.safetensors +3 -0
output-00004-of-00006.safetensors +3 -0
output-00005-of-00006.safetensors +3 -0
output-00006-of-00006.safetensors +3 -0
pytorch_model.bin.index.json +0 -0
special_tokens_map.json +11 -0
tokenizer.model +3 -0
tokenizer_config.json +65 -0

README.md ADDED Viewed

	@@ -0,0 +1,175 @@

+---
+datasets:
+- Open-Orca/SlimOrca-Dedup
+- teknium/openhermes
+- meta-math/MetaMathQA
+- migtissera/Synthia-v1.3
+- THUDM/AgentInstruct
+- LeoLM/German_Songs
+- LeoLM/German_Poems
+- LeoLM/OpenSchnabeltier
+- bjoernp/ultrachat_de
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+license: llama2
+model_creator: DiscoResearch
+model_type: llama
+tags:
+- goliath
+- deutsch
+- llama2
+- discoresearch
+---
+<img src="imgs/disco_goliath.jpeg" width="600">
+# DiscoLM 120b (Alpha)
+**DiscoLM 120b (Alpha)** is an experimental 120b model based on [Alpindale´s Goliath 120b](https://huggingface.co/alpindale/goliath-120b), a merge of different Llama2-70b models, and further finetuned on a dataset of some the most popular open-source instruction sets.
+Disco 120b is a [DiscoResearch](https://huggingface.co/DiscoResearch) project and was trained by [Björn Plüster](https://huggingface.co/bjoernp).
+Many thanks to [LAION](https://laion.ai) and [HessianAI](https://hessian.ai/) for scientific supervision, coordination and compute resources provided for this project on supercomputer 42 by [HessianAI](https://hessian.ai/)!
+<img src="https://hessian.ai/wp-content/themes/hessianai/img/hessian-ai-logo.svg" width="120">
+<img src="https://avatars.githubusercontent.com/u/92627801?s=200&v=4" width="120">
+## Table of Contents
+1. [Download](#download)
+2. [Benchmarks](#benchmarks)
+3. [Prompt Format](#prompt-format)
+4. [Dataset](#dataset)
+5. [Acknowledgements](#acknowledgements)
+6. [Contact](#contact)
+7. [About DiscoResearch](#about-discoresearch)
+8. [Disclaimer](#disclaimer)
+## Download
+| Huggingface    | GPTQ  | GGUF  | AWQ   | *Base Model* |
+|-------|-------|-------|-------|-------|
+| [Link](https://huggingface.co/DiscoResearch/DiscoLM-120b) | [Link](https://huggingface.co/TheBloke/DiscoLM-120b-GPTQ) | [Link](https://huggingface.co/TheBloke/DiscoLM-120b-GGUF) | [Link](https://huggingface.co/TheBloke/DiscoLM-120b-AWQ) | [Goliath 120b](https://huggingface.co/alpindale/goliath-120b) |
+## Benchmarks
+### Hugginface Leaderboard
+This models is still an early Alpha and we can't guarantee that there isn't any contamination.
+However, the average of **73.198** would earn the #2 spot on the HF leaderboard at the time of writing and the highest score for a >70b model yet.
+| Metric | Value |
+|-----------------------|-------|
+| ARC (25-shot)         | 69.54 |
+| HellaSwag (10-shot)   | 86.49 |
+| MMLU (5-shot)         | 70.32 |
+| TruthfulQA (0-shot)   | 61.42 |
+| Winogrande (5-shot)   | 83.03 |
+| GSM8k (5-shot)   | 68.39 |
+| **Avg.**                  | **73.198** |
+We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard.
+### FastEval
+| Metric | Value |
+|-----------------------|-------|
+| GSM8K       | 81.2 |
+| Math   | 22.3 |
+| BBH         | 72.9 |
+| MMLU   | 67.9 |
+| **Avg.**                  | **53.3** |
+This places DiscoLM 120b firmly ahead of gpt-3.5-turbo-0613 as seen on the screenshot of the current (sadly no longer maintained) FastEval CoT leaderboard:
+![FastEval Leaderboard](imgs/cot_leaderboard.png)
+### MTBench
+```json
+{
+    "first_turn": 8.45,
+    "second_turn": 7.45,
+    "categories": {
+        "writing": 9.4,
+        "roleplay": 8.65,
+        "reasoning": 6.85,
+        "math": 5.55,
+        "coding": 4.95,
+        "extraction": 9.15,
+        "stem": 9.225,
+        "humanities": 9.825
+    },
+    "average": 7.95
+}
+```
+Screenshot of the current FastEval MT Bench leaderboard:
+![FastEval Leaderboard](imgs/mtbench_leaderboard.png)
+## Prompt Format
+This model follows the ChatML format:
+```
+<|im_start|>system
+You are DiscoLM, a helpful assistant.
+<|im_end|>
+<|im_start|>user
+Please tell me possible reasons to call a research collective "Disco Research"<|im_end|>
+<|im_start|>assistant
+```
+This formatting is also available via a pre-defined Transformers chat template, which means that lists of messages can be formatted for you with the apply_chat_template() method:
+```python
+chat = [
+  {"role": "system", "content": "You are DiscoLM, a helpful assistant."},
+  {"role": "user", "content": "Please tell me possible reasons to call a research collective Disco Research"}
+]
+tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+```
+If you use `tokenize=True` and `return_tensors="pt"` instead, then you will get a tokenized and formatted conversation ready to pass to `model.generate()`.
+## Dataset
+The dataset curation for DiscoLM 120b followed a "brute force"/"PoC" approach, as one goal was to see whether a 120b model can "absorb" more instruction data than a 70b model.
+The following datasets were used for training DiscoLM 120b:
+* [SlimOrca-Dedup](https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup)
+* [OpenSchnabeltier](https://huggingface.co/datasets/LeoLM/OpenSchnabeltier) translated to DE from [OpenPlatypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)
+* [OpenHermes](https://huggingface.co/datasets/teknium/openhermes)
+* [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)
+* [UltraChat DE](https://huggingface.co/datasets/bjoernp/ultrachat_de) translated to DE from [UltraChat](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
+* [Synthia v.1.3](https://huggingface.co/datasets/migtissera/Synthia-v1.3)
+* [German_Songs](https://huggingface.co/datasets/LeoLM/German_Songs)
+* [German_Poems](https://huggingface.co/datasets/LeoLM/German_Poems)
+* Capybara Dataset by [Nous Research](https://huggingface.co/NousResearch/)
+* Vezora/Tested-188k-Python (No longer available? Version changed to [Vezora/Tested-22k-Python-Alpaca](https://huggingface.co/datasets/Vezora/Tested-22k-Python-Alpaca))
+Many thanks for all dataset providers/curators!
+## Contact
+Best way to reach us is on our [Discord](https://discord.gg/S8W8B5nz3v).
+## About DiscoResearch
+DiscoResearch is an aspiring open research community. Disco should be a place where researchers from many communities can come together to combine their expertise and create innovative and groundbreaking LLMs. Come join our Discord, share your opinions and ideas, and advance open LLM research with us!
+## Acknowledgements
+Disco 120b is a [DiscoResearch](https://huggingface.co/DiscoResearch) project and was trained by [Björn Plüster](https://huggingface.co/bjoernp). [Jan Harries](https://huggingface.co/jphme) helped with technical adivce, logistics and the Model Card and [AutoMeta](https://huggingface.co/Alignment-Lab-AI) also provided helpful technical adivce.
+The model was trained with compute provided by [HessianAI](https://hessian.ai/) in collaboration with [LAION](https://laion.ai) - many thanks in particular to [Patrick Schramowski](https://huggingface.co/PSaiml) for his support.
+We are standing on the shoulders of giants; many thanks in no particular order to [LAION](https://laion.ai) and especially to [Christoph Schuhmann](https://laion.ai) who got us all connected,
+[alpindale](https://huggingface.co/alpindale) for Goliath 120b (with important contributions by [Charles Goddard](https://huggingface.co/chargoddard) and [Undi95](https://huggingface.co/Undi95)), [TheBloke](https://huggingface.co/TheBloke) for providing quantized versions, [winglian](https://huggingface.co/winglian) for Axolotl which was used to train the model and the SlimOrca dataset, [garage-bAInd](https://huggingface.co/garage-bAInd), [Teknium](https://huggingface.co/teknium), [Migel Tissera](https://huggingface.co/migtissera), [MetaMath](https://huggingface.co/meta-math) for their great datasets (please contact us if we forgot to mention you here!).
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+## Disclaimer
+The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model.
+This model should only be used for research purposes. The original Llama2 license and all restrictions of datasets used to train this model apply.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "</s>": 2,
+  "<s>": 1,
+  "<unk>": 0,
+  "<|im_end|>": 32000,
+  "<|im_start|>": 32001
+}

config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "_name_or_path": "alpindale/goliath-120b",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 8192,
+  "initializer_range": 0.02,
+  "intermediate_size": 28672,
+  "max_position_embeddings": 4096,
+  "model_type": "llama",
+  "num_attention_heads": 64,
+  "num_hidden_layers": 137,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float16",
+  "transformers_version": "4.34.0",
+  "use_cache": true,
+  "vocab_size": 32032
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.34.0"
+}

huggingface-metadata.txt ADDED Viewed

	@@ -0,0 +1,53 @@

+url: https://huggingface.co/DiscoResearch/DiscoLM-120b
+branch: main
+download date: 2023-12-09 08:59:58
+sha256sum:
+    5d7745d5d27b6aab7603eded5ac8d04beb6400a0e8ee9ca37d359bf5f63c7870 model-00001-of-00024.safetensors
+    bf15b9a67c0e27c51f719a77e30db43c6ba3e68be883a5af97d64191cc52c2e5 model-00002-of-00024.safetensors
+    286cf28355ae3d7671fab2350519d704f99ca1b801a41c9b7076b4ea52a7bee3 model-00003-of-00024.safetensors
+    a57b5561b72d64dba85aeff007137e1aa06c6472e01633c7b5698787c98bb874 model-00004-of-00024.safetensors
+    736016f98195f0e7f1065d262b933051f17dcd18cef386d98c75eaf8c53290b0 model-00005-of-00024.safetensors
+    602dd5973bed10c3aa0bcde64ba7434ed8a560e031ec2882edbe18a8b1072576 model-00006-of-00024.safetensors
+    62cdd7479739c3ff2211e1bd3bc0253e17b03af99933f7eef5bec1c323e6919b model-00007-of-00024.safetensors
+    3facffec8a7494996e07b3ae494520d29a931c5643b2250a985c27f826fe0dd8 model-00008-of-00024.safetensors
+    f40347240a2571e4ad0ff6148132da65a4bb6b41d0d3496b8c52d9246c35a19d model-00009-of-00024.safetensors
+    4312012b91255578969d385b556e54a91fe6388a99279b1267490ae5574900f2 model-00010-of-00024.safetensors
+    b62777dabfd6c706f316f252fc2d21ef652a611de22b0ff0ae64d64e79072c51 model-00011-of-00024.safetensors
+    3666ada228d31ed1761453ae03f49baf548039e0617ffc64ceb9d1a62f8b94a8 model-00012-of-00024.safetensors
+    c71beee6dab377ecaa9786c8c49140203a7b6adfdcaee4899e70385cec6be22b model-00013-of-00024.safetensors
+    27d983d3fad6739b87ed70b21a0125df9570a6bc4767453191414b43e92d3745 model-00014-of-00024.safetensors
+    ad22dc328097c648400ce97d3e59be57b9014be22fd9c85f7768faf8d86736e8 model-00015-of-00024.safetensors
+    3fee1d394a49fe68ce653e951461fa77927cdb8410a1c899e2191f85c5e5fa98 model-00016-of-00024.safetensors
+    53e0fc80e3034a3930a03c5ff192be7a5a5a0ccef8134294244a9e8d2e4d79cb model-00017-of-00024.safetensors
+    988ecf1c9e8e2727c832ab47fe2e31350fae43d8852e8cf298f111d2de1de7da model-00018-of-00024.safetensors
+    c1b94c1abb1bbeab4f19e74dda158212fe2be879ec3473478de050c35bb449b5 model-00019-of-00024.safetensors
+    9f0612ba92fb8f8f32d9ab8343ad32d0eac0fd75fb2cb46b51c9772a02c89969 model-00020-of-00024.safetensors
+    4a15d7d8d0c0c72bf9a78f443d28d8040808a474d98525d0237258dc694ac6cd model-00021-of-00024.safetensors
+    004d697bc844e652690a35561d9d8065ebcf2e16edb98c7f55895459e7c46720 model-00022-of-00024.safetensors
+    6bf94f3a7d8d835a862f4e6e2c773cf58a02871a9ea6dda83d5aa2d17631fc3e model-00023-of-00024.safetensors
+    64cc9b8beb71f2523aa67ccb15ea76fbac243814c6f684cd1161c86c52478cd8 model-00024-of-00024.safetensors
+    703e6a21f5f80340c340e61e5cd8528e1ab376b36a7f1605d5c01c087886bc77 pytorch_model-00001-of-00024.bin
+    0b9aad1976a13a8ea067990910aa053a931309f22125e97120d5f18016f6f5d1 pytorch_model-00002-of-00024.bin
+    4843f6af6e7b11443f4ce8386e7e90183146785e6e5f486d765b508a52dedbbb pytorch_model-00003-of-00024.bin
+    bbed432e03f0a3e7bbbfcc06d80fb776247b30aa19214868c9462f3c9328b510 pytorch_model-00004-of-00024.bin
+    1b2cf32d7b276b39815c1d9468fdc7d6e9a3e9dba07dcfa915f0418fc6e575c8 pytorch_model-00005-of-00024.bin
+    cc4b1a8780a0fc894c39398e3c161bd0b38f07d5fed483bf63c89abd1299c5c0 pytorch_model-00006-of-00024.bin
+    110dfb6eb79136d2c16993a367b7ee0eafff4a320567a782947c55ba2b4f9606 pytorch_model-00007-of-00024.bin
+    0888f80ce1d7404395956f21e03552aa2daf7e7cdea8e9eb69b59c16471ad424 pytorch_model-00008-of-00024.bin
+    5128e084c9f91ca5d8ecfd42da6ea35682d01f8a97fc0072596bac770b2c0a1c pytorch_model-00009-of-00024.bin
+    f35a06fca3b79e2797b05539b0400ab8697b88f51f0b34fceb5536101319a0e8 pytorch_model-00010-of-00024.bin
+    6aa66f9cd7ae407deac3fea37e18bdb81da6cd915df56cdb127b350341db0548 pytorch_model-00011-of-00024.bin
+    9ab5aaf07f7bc0920dca5f27614f1d2248cf6d4cb33556b387cfbbbaf587da11 pytorch_model-00012-of-00024.bin
+    97617b7f47ec43934b15f7f3fbd288bf2c8184bda232b902c3528405dedea818 pytorch_model-00013-of-00024.bin
+    68f64eeb00dda4392039f2efcd6b555ee49ed4fd7b5f731163fb4579a3f1b850 pytorch_model-00014-of-00024.bin
+    b19bfb818160cb56dffb7434a4ed9c783123775ede9e276bfc88e8615f87042a pytorch_model-00015-of-00024.bin
+    c0df2ab8bc5f809636c1c53c3e9acd26d64601b145cc61f809f590f139158469 pytorch_model-00016-of-00024.bin
+    d18f0ae5507c17837ed0c20257548a26332185743ee9ab0a87a6ac75f8609dd3 pytorch_model-00017-of-00024.bin
+    0f14867fefdefc4173822977fb585168e15be49deb0047db40bafa386e55d3c6 pytorch_model-00018-of-00024.bin
+    bd7b90be4c3613e22ed1f0a0505dba610020aa135f956a838817f89aaedb142b pytorch_model-00019-of-00024.bin
+    7543e3e5972532ebfb2131f1248c8ab818cd4159863425fa67b2df48b4c98ec8 pytorch_model-00020-of-00024.bin
+    fa0e4d0af437d853965e64017f7fac8eb52f075d07f1ac04e3973f60026d58ed pytorch_model-00021-of-00024.bin
+    83064cfa842547bb784d7d6178fd049311c4aff42feb0ada891c7585a0decda8 pytorch_model-00022-of-00024.bin
+    a967860faaed0db1e6edb41c4cb82ee25012e26f178aafc1a8145c3278d24762 pytorch_model-00023-of-00024.bin
+    58d4ce94410f315c86f15f7439ed4d68e5341f1ff136f502edc974244a43ec53 pytorch_model-00024-of-00024.bin
+    9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347 tokenizer.model

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

output-00001-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c9d41bcef98c72a170c1ea23e36343bd02b7cdac225439c47bd9e1ebbd05ecd
+size 8581026016

output-00002-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c8252b83c4df3344181258554fdda664890fd2cc07431ab03cf272433971ec1
+size 8588188344

output-00003-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f464a8ee589aa2a8df0a593276e71dbc5f3a967b5438276c1212d418569de818
+size 8555819944

output-00004-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e99c529d224aece834029cdbf54c9ed15254cc815f907f7946c380001ee55cc5
+size 8552624024

output-00005-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79bb80b80c29de8a014b33a7e6cf0aa39157660da8a5fe5e587b973f3efb6d9e
+size 8503964344

output-00006-of-00006.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3593c2de2bd4feab1812392c65cf71cc9fecfe4ecf11ff11664a171cc3ddbe9f
+size 1952914560

pytorch_model.bin.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "eos_token": "<|im_end|>",
+  "pad_token": "</s>",
+  "unk_token": "<unk>"
+}

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32001": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<unk>",
+    "<s>",
+    "</s>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "legacy": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "tokenizer_file": null,
+  "trust_remote_code": true,
+  "unk_token": "<unk>",
+  "use_default_system_prompt": true,
+  "use_fast": false
+}