File size: 1,164 Bytes
6e2f86a 9d018f5 6e2f86a 9d018f5 6e2f86a d0ad94e 5d8e8eb 9d018f5 5d8e8eb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
library_name: nanotron
---
# โ๏ธ Nano-Mistral
Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/nanotron/)
Also contains converted pretrained weights for Mistral-7B-0.1: https://huggingface.co/mistralai/Mistral-7B-v0.1
## ๐ Quickstart
```bash
# Generate a config file
python config_tiny_mistral.py
# Run training
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
```
## ๐ Run generation with pretrained Mistral-7B-0.1
```bash
export CUDA_DEVICE_MAX_CONNECTIONS=1
torchrun --nproc_per_node=1 run_generate.py --ckpt-path ./pretrained/Mistral-7B-v0.1
```
## ๐ Use your custom model
- Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration
- Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture
- Pass the previous to the `DistributedTrainer` class in `run_train.py`:
```python
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
```
- Run training as usual
|