Edit model card

Aim was to repair damage caused by duplicating the upscale with some additional training on completion from Cosmopedia.

Seemed to be converged at 50% epoch so I cut it off and used that adapter, which I hope actually did something because it wasn't a checkpoint.

eq_bench testing, as a quick reference, strongly suggests it did; but I'm not sure how much that one's just random on a small model like this.

It also seems to be generating completions much more smoothly than its predecessor, though, rather than getting stuck in a repeated word, which is certainly a good sign.

Nous evals:

Model AGIEval GPT4All TruthfulQA Bigbench Average
danube2-upscale-1.1 25.43 60.13 40.22 32.06 39.46

Original model:

Model AGIEval GPT4All TruthfulQA Bigbench Average
h2o-danube2-1.8b-base 25.65 62.26 38.05 32.89 39.71

Axolotl config was something like this:

base_model: Lambent/danube2-upscale-1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
trust_remote_code: false

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: HuggingFaceTB/cosmopedia-100k
    type: completion
dataset_prepared_path: prepared-pedia
val_set_size: 0.01
output_dir: ./qlora-out

sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

adapter: qlora
lora_model_dir:
lora_r: 128
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: qlora-danube-upscale
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.002
fsdp:
fsdp_config:
special_tokens:
Downloads last month
8
Safetensors
Model size
2.25B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.