Edit model card

TinyLlaMa 1.1B 1431k 4-bit Python Coder πŸ‘©β€πŸ’»

TinyLlaMa 1.1B fine-tuned on the python_code_instructions_18k_alpaca Code instructions dataset by using the Axolot library in 4-bit with PEFT library.

Pretrained description

TinyLlama-1.1B

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, they can achieve this within a span of "just" 90 days using 16 A100-40G GPUs πŸš€πŸš€.

They adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.

Training data

python_code_instructions_18k_alpaca

The dataset contains problem descriptions and code in python language. This dataset is taken from sahil2801/code_instructions_120k, which adds a prompt column in alpaca style.

Training hyperparameters

The following axolot configuration was used during training:

  • load_in_8bit: false

  • load_in_4bit: true

  • strict: false

  • datasets:

    • path: iamtarun/python_code_instructions_18k_alpaca type: alpaca
  • dataset_prepared_path:

  • val_set_size: 0.05

  • output_dir: ./qlora-out

  • adapter: qlora

  • sequence_len: 1096

  • sample_packing: true

  • pad_to_sequence_len: true

  • lora_r: 32

  • lora_alpha: 16

  • lora_dropout: 0.05

  • lora_target_modules:

  • lora_target_linear: true

  • lora_fan_in_fan_out:

  • gradient_accumulation_steps: 1

  • micro_batch_size: 1

  • num_epochs: 2

  • max_steps:

  • optimizer: paged_adamw_32bit

  • lr_scheduler: cosine

  • learning_rate: 0.0002

  • train_on_inputs: false

  • group_by_length: false

  • bf16: false

  • fp16: true

  • tf32: false

  • gradient_checkpointing: true

  • logging_steps: 10

  • flash_attention: false

  • warmup_steps: 10

  • weight_decay: 0.0

Framework versions

  • torch=="2.1.2"
  • flash-attn=="2.5.0"
  • deepspeed=="0.13.1"
  • axolotl=="0.4.0"

Example of usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "edumunozsala/TinyLlama-1431k-python-coder"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, torch_dtype=torch.float16, 
                                             device_map="auto")

instruction="Write a Python function to display the first and last elements of a list."
input=""

prompt = f"""### Instruction:
Use the Task below and the Input given to write the Response, which is a programming code that can solve the Task.

### Task:
{instruction}

### Input:
{input}

### Response:
"""

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.3)

print(f"Prompt:\n{prompt}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")

Citation

@misc {edumunozsala_2023,
    author       = { {Eduardo MuΓ±oz} },
    title        = { TinyLlama-1431k-python-coder },
    year         = 2024,
    url          = { https://huggingface.co/edumunozsala/TinyLlama-1431k-python-coder },
    publisher    = { Hugging Face }
}
Downloads last month
16

Dataset used to train edumunozsala/TinyLlama-1431k-python-coder