File size: 3,226 Bytes

e1c5269
ec25aaa
e1c5269
ec25aaa
e1c5269
 
 
 
 
 
 
 
 
 
 
ec25aaa
e1c5269
 
 
 
 
 
 
 
ec25aaa
e1c5269
 
 
 
 
 
 
 
 
 
ec25aaa
e1c5269
 
 
 
 
 
 
 
 
 
ec25aaa
e1c5269
 
 
 
 
 
 
 
 
ec25aaa
e1c5269
ec25aaa
282d29e
e1c5269
 
 
 
ec25aaa
e1c5269
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec25aaa
282d29e
 
e1c5269
 
 
ec25aaa
 
282d29e
e1c5269
 
 
ec25aaa
e1c5269
 
 
 
 
 
 
 
ec25aaa
e1c5269

---
base_model: meta-llama/Meta-Llama-3-8B
library_name: peft
license: llama3
tags:
- axolotl
- generated_from_trainer
model-index:
- name: Sanskrit-llama
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.1`
```yaml

base_model: meta-llama/Meta-Llama-3-8B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
max_steps: 
bnb_config_kwargs:
  llm_int8_has_fp16_weight: false
  bnb_4bit_quant_type: nf4
  bnb_4bit_use_double_quant: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: VinitT/Sanskrit-Llama_Base-Dataset
    type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/qlora-out
chat_template: chatml
hub_model_id: diabolic6045/Sanskrit-llama
hf_use_auth_token: true
adapter: qlora
lora_model_dir:

sequence_len: 512
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out: 

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.2
learning_rate: 5e-5

train_on_inputs: false
group_by_length: false
bf16: false
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
#fsdp:
#  - full_shard
#  - auto_wrap
#fsdp_config:
#  fsdp_limit_all_gathers: true
#  fsdp_sync_module_states: true
#  fsdp_offload_params: true
#  fsdp_use_orig_params: false
#  fsdp_cpu_ram_efficient_loading: true
#  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
#  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
#  fsdp_state_dict_type: FULL_STATE_DICT
special_tokens:
  pad_token: "<|end_of_text|>"

```

</details><br>

# Sanskrit-llama

This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 1

### Training results



### Framework versions

- PEFT 0.11.1
- Transformers 4.42.3
- Pytorch 2.1.2
- Datasets 2.19.1
- Tokenizers 0.19.1