File size: 1,333 Bytes
108df53
 
ada824b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108df53
 
 
cd2e0ea
09a7b11
 
 
108df53
349e1a4
 
 
 
3f4a964
 
 
108df53
 
83597bd
108df53
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: llama3.1
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
pipeline_tag: text-generation
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
- mergekit
---

This model is "Built with Llama".

It is based on [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and 
was created with the help of [mergekit](https://github.com/arcee-ai/mergekit).
This is the mergekit configuration we used: [mergekit_moe_config.yml](https://huggingface.co/deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw/blob/main/mergekit_moe_config.yml)

It should be noted that this model is the raw model after merging.
It still has randomly initialized router networks and will not be better than a single one of its expert models.
This model requires further training before use.

This model has a total of 47.5B params, which is slightly more than the [Mixtral 8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
with its 46.7B params.

## Licensing

This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 [Philip May](https://philipmay.org), [Deutsche Telekom AG](https://www.telekom.de/)\
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.