Safetensors
llama
Edit model card

image/png

Intro

We have released a collection of radlab/pLLama3.2 models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3.2 models. As part of the collection, we provide models in 1B and 3B architecture. Each model is available in two configurations:

  • radlab/pLLama3-1B, a model in architecture 1B only after fine-tuning
  • radlab/pLLama3-1B-DPO, a model in architecture 1B after fine-tuning and DPO process
  • radlab/pLLama3-3B, a model in architecture 3B only after fine-tuning
  • radlab/pLLama3-3B-DPO, a model in architecture 3B after fine-tuning and DPO process

Dataset

In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets. In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.

Learning

The learning process was divided into two stages:

  • Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
  • After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.

Proposed parameters:

  • temperature: 0.6
  • repetition_penalty: 1.0

Outro

Enjoy!

Downloads last month
11
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for radlab/pLLama3.2-3B-DPO

Finetuned
(1)
this model
Quantizations
2 models

Collection including radlab/pLLama3.2-3B-DPO