s3nh
/

mamba-2.8b_dolly_instruction_polish

Text Generation

Inference Endpoints

Model card Files Files and versions Community

s3nh commited on Dec 21, 2023

Commit

d0695fa

•

1 Parent(s): b34e465

Create README.md

Files changed (1) hide show

README.md +93 -0

README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+---
+license: openrail
+datasets:
+- s3nh/alpaca-dolly-instruction-only-polish
+language:
+- pl
+- en
+library_name: transformers
+pipeline_tag: text-generation
+---
+Finetuned state-space/mamba-3.8b using s3nh/polish_dolly instruction dataset.
+```
+pip install mamba_ssm
+```
+is needed to properly infer on this model.
+More detail explanation soon.
+Axolotl config
+```
+base_model: state-spaces/mamba-2.8b
+model_type: MambaLMHeadModel
+tokenizer_type: AutoTokenizer
+tokenizer_config: EleutherAI/gpt-neox-20b
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+datasets:
+  - path: s3nh/alpaca-dolly-instruction-only-polish
+    type: alpaca
+dataset_prepared_path:
+val_set_size: 0.0
+output_dir: ./mamba
+sequence_len: 1024
+sample_packing: false
+pad_to_sequence_len: false
+wandb_project:
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+gradient_accumulation_steps: 4
+micro_batch_size: 1
+num_epochs: 2
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 5e-5
+train_on_inputs: false
+group_by_length: true
+bf16: true
+fp16: false
+tf32: true
+save_strategy: steps
+gradient_checkpointing: false
+early_stopping_patience:
+resume_from_checkpoint: true
+local_rank:
+logging_steps: 100
+xformers_attention:
+flash_attention:
+warmup_steps: 10
+evals_per_epoch: 2
+eval_table_size:
+eval_table_max_new_tokens: 128
+saves_per_epoch:
+save_steps: 3000
+debug:
+deepspeed:
+weight_decay: 0.0
+fsdp:
+fsdp_config:
+special_tokens:
+tokens:
+save_safetensors: False
+```