Text Generation
Transformers
Safetensors
mistral
mergekit
Merge
Inference Endpoints
text-generation-inference
Edit model card

eq90parsedanube

This is a merge of pre-trained language models created using mergekit.

First one that's shown promising capability improvement over the base model h2o-danube2-1.8b-base.

Training methodology ... is a bit of a mess, trying out different things. I'm adding the datasets used at any point, but I don't think replicating the recipe is doable or sensible.

Original upscale at Lambent/danube2-upscale-1, duplicating layers 16-21. Various training methods attempted to repair. Linear merge is of the 4 that were at least 90% parseable by the EQ-Bench benchmark.

Model AGIEval GPT4All TruthfulQA Bigbench Average
danube2-upscale-1.7 27.97 62.16 42.2 32.2 41.13
Model EQ-Bench Average
danube2-upscale-1.7 15.52 15.52

EQ-Bench

Task Version Metric Value Stderr
eq_bench 2.1 eqbench,none 15.52
eqbench_stderr,none 2.77
percent_parseable,none 100
percent_parseable_stderr,none 0
alias eq_bench

Average: 15.52%

Average score: 15.52%

Merge Details

Merge Method

This model was merged using the linear merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Lambent/danube2-upscale-1.531qlora
    parameters:
      weight: 1.0
  - model: Lambent/danube2-upscale-1.53lisa
    parameters:
      weight: 1.0
  - model: Lambent/danube2-upscale-1.51galore
    parameters:
      weight: 1.0
  - model: Lambent/danube2-upscale-1.51qlora
    parameters:
      weight: 1.0
merge_method: linear
dtype: float16

Downloads last month
397
Safetensors
Model size
2.25B params
Tensor type
FP16
·
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.

Datasets used to train Lambent/danube2-upscale-1.7