TinyLlaMa 1.1B 1431k 4-bit Python Coder π©βπ»
TinyLlaMa 1.1B fine-tuned on the python_code_instructions_18k_alpaca Code instructions dataset by using the Axolot library in 4-bit with PEFT library.
Pretrained description
The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, they can achieve this within a span of "just" 90 days using 16 A100-40G GPUs ππ.
They adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
Training data
python_code_instructions_18k_alpaca
The dataset contains problem descriptions and code in python language. This dataset is taken from sahil2801/code_instructions_120k, which adds a prompt column in alpaca style.
Training hyperparameters
The following axolot
configuration was used during training:
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: iamtarun/python_code_instructions_18k_alpaca type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./qlora-out
adapter: qlora
sequence_len: 1096
sample_packing: true
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 2
max_steps:
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false
gradient_checkpointing: true
logging_steps: 10
flash_attention: false
warmup_steps: 10
weight_decay: 0.0
Framework versions
- torch=="2.1.2"
- flash-attn=="2.5.0"
- deepspeed=="0.13.1"
- axolotl=="0.4.0"
Example of usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "edumunozsala/TinyLlama-1431k-python-coder"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, torch_dtype=torch.float16,
device_map="auto")
instruction="Write a Python function to display the first and last elements of a list."
input=""
prompt = f"""### Instruction:
Use the Task below and the Input given to write the Response, which is a programming code that can solve the Task.
### Task:
{instruction}
### Input:
{input}
### Response:
"""
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.3)
print(f"Prompt:\n{prompt}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
Citation
@misc {edumunozsala_2023,
author = { {Eduardo MuΓ±oz} },
title = { TinyLlama-1431k-python-coder },
year = 2024,
url = { https://huggingface.co/edumunozsala/TinyLlama-1431k-python-coder },
publisher = { Hugging Face }
}
- Downloads last month
- 32