teeny-tiny-mixtral / README.md
Shamane's picture
Update README.md
c9440c3 verified
---
library_name: transformers
tags:
- arcee-ai
---
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/654aa1d86167ff03f70e32f9/5qausvO9z7FhTl5wyJhRy.webp)
# Model Card for Model ID
This model is a dummy model created for testing purposes only. It utilizes a custom configuration to explore various training scenarios and should not be used for production.
## Configuration Highlights
The configuration of this dummy model is distinct from the original model in several key aspects:
- **Number of Layers:** Reduced to 2, allowing for quicker tests of layer-specific behaviors.
- **Experts:** Configured with 4 local experts and 2 experts per token, experimenting with the model's capacity to handle multiple expert inputs.
- **Hidden Size:** Set at 512, this smaller size is suitable for testing the impact of network width.
- **Intermediate Size:** Enlarged to 3579, to investigate how an increase in the size affects the model's ability to process information deeply.