File size: 863 Bytes
6e2f86a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0ad94e
5d8e8eb
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
library_name: nanotron
---

# ⚙️ Nano-Mistral

Modeling code for Mistral to use with [Nanotron](https://github.com/huggingface/nanotron/)

## 🚀 Quickstart

```python
# Generate a config file
python config_tiny_mistral.py

# Run training
export CUDA_DEVICE_MAX_CONNECTIONS=1 # important for some distributed operations
torchrun --nproc_per_node=8 run_train.py --config-file config_tiny_mistral.yaml
```

## 🚀 Use your custom model

- Update the `MistralConfig` class in `config_tiny_mistral.py` to match your model's configuration
- Update the `MistralForTraining` class in `modeling_mistral.py` to match your model's architecture
- Pass the previous to the `DistributedTrainer` class in `run_train.py`:
```python
trainer = DistributedTrainer(config_file, model_class=MistralForTraining, model_config_class=MistralConfig)
```
- Run training as usual