Edit model card

Finetuned state-space/mamba-3.8b using s3nh/polish_dolly instruction dataset.


pip install mamba_ssm

is needed to properly infer on this model. More detail explanation soon.

Axolotl config

base_model: state-spaces/mamba-2.8b
model_type: MambaLMHeadModel
tokenizer_type: AutoTokenizer
tokenizer_config: EleutherAI/gpt-neox-20b

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: s3nh/alpaca-dolly-instruction-only-polish
    type: alpaca
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./mamba

sequence_len: 1024
sample_packing: false
pad_to_sequence_len: false

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 5e-5

train_on_inputs: false
group_by_length: true

bf16: true
fp16: false
tf32: true
save_strategy: steps
gradient_checkpointing: false
early_stopping_patience:
resume_from_checkpoint: true
local_rank:
logging_steps: 100
xformers_attention:
flash_attention:

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch:
save_steps: 3000
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
tokens:
save_safetensors: False


Downloads last month

-

Downloads are not tracked for this model. How to track
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train s3nh/mamba-2.8b_dolly_instruction_polish