Changing the value of kv_count from 34 to 40 indicates an increase in the number of key-value pairs in the model. These key-value pairs are mainly used to represent attention information within neural networks, particularly in Transformer-type models such as LLaMA.

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the passthrough merge method using Sao10K/Fimbulvetr-11B-v2 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

base_model:  Sao10K/Fimbulvetr-11B-v2 
merge_method: passthrough
dtype: float16
parameters:
  normalize: true

slices:
  - sources:
      - model: Sao10K/Fimbulvetr-11B-v2
        layer_range: [0, 48]  # Assumi che il modello abbia 48 layer
    densify:
      - linear
      - "rope:alpha=8192/4096"  # Estende il contesto a 8192

tokens:
  - source: Sao10K/Fimbulvetr-11B-v2
    mode: stretch

 
Downloads last month
12
Safetensors
Model size
10.7B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ClaudioItaly/Fimbulvetr-40

Finetuned
(11)
this model
Quantizations
2 models