Here's a "continued pre-trained" model using Finnish Wikipedia dataset. I still don't understand why no one in Finland has figured out that they could just do continued pre-training on existing models that are already supported by every frontend.. I've seen Japanese models perform pretty well with that kind of continued pre-training, yet Finnish models are still done from scratch which means they suck ass. If you compare them to Llama 3 or Gemma 2 they just suck so much. They can't even match Mistral 7B a model from last year. Just stop wasting money on training models from scratch, use these better models as base and train it on all your closed-source data I don't have access to. Thank you.

Merged model: mpasila/Llama-3.2-Finnish-Wikipedia-1B

Trained with regular LoRA (not quantized/QLoRA) and LoRA rank was 128 and Alpha set to 32. Trained for 1 epoch using RTX 4090 for about 12,5 hours.

Uploaded Llama-3.2-Finnish-Wikipedia-LoRA-1B model

  • Developed by: mpasila
  • License: Llama 3.2 Community License Agreement
  • Finetuned from model : unsloth/Llama-3.2-1B

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B

Finetuned
(22)
this model

Dataset used to train mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B