|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- moe |
|
- moah |
|
- mod |
|
datasets: |
|
- Locutusque/UltraTextbooks |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
MoM: Mixture of Mixture |
|
|
|
This Model is a first test to combine [Jamba](https://huggingface.co/ai21labs/Jamba-v0.1) architecture with mixture of attention head and mixture of depth. |
|
|
|
Mamba and attention layers are in bf16 precision and the rest is in 1.58bits precision |
|
|
|
107M over a total of 1025M parameters are in bf16 precision ~ 10% of the parameters are in bf16 |
|
|
|
The goal is to developpe and test if this kind of architectures have not too much quality loss for a fast inference. |
|
|
|
|
|
- **Model type:** Mixture of attention head mixture of depth and mixture of expert with 1.58bits linear layer for **MLP** |
|
- **License:** Apache licence 2.0 |
|
|
|
### Model Sources [optional] |
|
|
|
|
|
- **Repository:** https://github.com/ostix360/optimized-LLM |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
If you want to test this model please look at this repo at this [commit](https://github.com/ostix360/optimized-LLM/tree/d266bc404346b71ea237c0744be0f8928f6b3217) |
|
|
|
|
|
## Training Details |
|
|
|
- **wandb**: [training detail](https://wandb.ai/ostix360/Mixture%20of%20mixture%20(mod,%20moah%20moe)/runs/wtoujazq) |
|
|
|
### Training Data |
|
|
|
We use the first 100k data of Locutusque/UltraTextbooks to train this model |
|
|
|
### Training Procedure |
|
|
|
We use adam-8 bits with default betas and epsilon values |
|
|
|
#### Preprocessing [optional] |
|
|
|
|
|
The data fit the model max length i.e. 512 tokens |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
Please look at the wandb meta data or the train.py in the repo to see the hyperparameters |
|
|
|
|
|
## Technical Specifications [optional] |
|
|
|
### Compute Infrastructure |
|
|
|
#### Hardware |
|
|
|
- one 4070 ti GPU |
|
|
|
#### Software |
|
|
|
- pytorch, transformers etc |
|
|