--- base_model: - mistralai/Mistral-Nemo-Instruct-2407 language: - ku - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - mistral datasets: - nazimali/kurdish-wikipedia-articles library_name: transformers --- Continued pre-training on `mistralai/Mistral-Nemo-Instruct-2407` using the Kurdish wiki dataset with `unsloth`. This model should be further fine-tuned since the pre-training was to improve Kurdish language understanding. It's a quantized model using `bitsandbytes` so that it uses less memory. See [bitsandbytes documentation](https://huggingface.co/docs/transformers/main/en/quantization/bitsandbytes#bitsandbytes). There isn't a standard or even a good Kurdish metric to evaluate the model (that I could find). Will make it my next project to create an evaluation so that there's a reproducible baseline for Kurdish. Will look into a multi-GPU training setup so don't have to wait all day for results. Would like to train it with both Kurmanji and Sorani. ### Use Should be fine-tuned further for a specific task. See instruction fine-tuned model [nazimali/Mistral-Nemo-Kurdish-Instruct](https://huggingface.co/nazimali/Mistral-Nemo-Kurdish-Instruct). ### Training Transformers `4.44.2` 1 NVIDIA A100 80GB PCIe Duration 6h 31m 4s ```json { "total_flos": 4121524790259794000, "train/epoch": 1, "train/global_step": 1960, "train/grad_norm": 3.1958093643188477, "train/learning_rate": 0, "train/loss": 1.2108, "train_loss": 1.256846008738693, "train_runtime": 23227.1752, "train_samples_per_second": 2.7, "train_steps_per_second": 0.084 } ``` #### Pre-training data: - `nazimali/kurdish-wikipedia-articles` - Dataset number of rows: 63,076 - Filtered columns `title, text` - Must have at least 1 character - Number of rows used for training: 62,720 #### Training prompt format: ```python training_prompt = """Gotara Wikipedia ### Sernav: {} ### Gotar: {}"""