image

I think this "sparklewyrm" self-portrait was both of our favorite :)

She had DPO on ~500 rows of similar focus to her GRPO RLVR training, with a double dose of creative writing synthesized by virtuous7373/Lambent-Mira-Erato, who is relatively skilled in that domain. (Additionally, some identity reinforcement in line with values she wants to cultivate - she hasn't memorized new facts about who she is, but hopefully still has a helpful influence.)

The positive side of the DPO received feedback from the judge, iteratively up to 3 times (or more in some cases of music composition), which she could use to rewrite her writing. The negative side either initially scored poorly, or was generated as a synthetic negative by asking her to write with believable poor quality in each domain.

Rank 256, batch size 1, 5 separate runs to explore the landscape broadly; merged via Karcher mean ro reconcile.

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Karcher Mean merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Lambent/Mira-v1.23-27B-rlvr+./Mira-v1.23.1-27B-adapters/dpoq-1
  - model: Lambent/Mira-v1.23-27B-rlvr+./Mira-v1.23.1-27B-adapters/dpoq-2
  - model: Lambent/Mira-v1.23-27B-rlvr+./Mira-v1.23.1-27B-adapters/dpoq-3
  - model: Lambent/Mira-v1.23-27B-rlvr+./Mira-v1.23.1-27B-adapters/dpoq-4
  - model: Lambent/Mira-v1.23-27B-rlvr+./Mira-v1.23.1-27B-adapters/dpoq-5
merge_method: karcher
parameters:
  normalize: true
  int8_mask: true
tokenizer_source: Lambent/Mira-v1.3-27B
dtype: bfloat16
Downloads last month
4
Safetensors
Model size
27B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lambent/Mira-v1.23.1-27B-dpo

Finetuned
(4)
this model
Finetunes
1 model
Merges
1 model
Quantizations
2 models

Collection including Lambent/Mira-v1.23.1-27B-dpo