sl-alex/llama-7b-alpaca-stepwise-lora-embtuned

Finetunes Llama-7b+Alpaca to solve problems via stepwise reasoning (OpenAI PRM800k dataset, or rather our postprocessed version, sl-alex/openai-prm800k-solutions-only).

Model description

This is a fork of llama-7b + tloen/alpaca-lora-7b.

That is: we loaded Llama-7b, we applied Alpaca LoRA, expanded vocabulary, then QLoRA 4-bit finetuned from there.

Parts:

base model llama-7b
LoRA 0 tloen/alpaca-lora-7b
LoRA 1
- adapter_config.json
- adapter_model.bin
tokenizer
- added_tokens.json
- special_tokens_map.json
- tokenizer.model
- tokenizer_config.json
finetuned input/output embedding layers:
- embed_tokens.pt (state_dict for model.get_input_embeddings(), embed_tokens: Embedding)
- lm_head.pt (state_dict for model.get_output_embeddings(), lm_head: Linear)

Training

Trained using qlora.py from our stepwise branch of qlora.
Known-good as of commit 3a86919.

python -m qlora --model_name_or_path huggyllama/llama-7b --lora_name_or_path tloen/alpaca-lora-7b --dataset prm800k-solutions --dataset_format prm800k-solutions --bf16 --max_memory_MB 24000 --use_bos_token_in_prompt --truncate_toward_center --source_max_len 184 --target_max_len 998 --gradient_accumulation_steps 4 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --learning_rate 0.0002 --run_name 13b_alpaca_special_tokens_long --report_to wandb --save_steps 64 --save_total_limit 3 --max_steps 1664 --evaluation_strategy steps --eval_steps 64 --generate_steps 16 --register_process_supervision_tokens

Usage

You can load using evaluate.py from our stepwise branch of qlora.
Known-good as of commit 3a86919.

You'll need to download embed_tokens.pt and lm_head.pt from this repository, and ensure they are saved to the root of the qlora repository, then run evaluate.py like so:

git clone -b stepwise https://github.com/scottlogic-alex/qlora.git
cd qlora
wget https://huggingface.co/sl-alex/llama-7b-alpaca-stepwise-lora-embtuned/resolve/main/embed_tokens.pt
wget https://huggingface.co/sl-alex/llama-7b-alpaca-stepwise-lora-embtuned/resolve/main/lm_head.pt
python -m evaluate \
--model_name_or_path huggyllama/llama-7b \
--base_lora_model_name_or_path tloen/alpaca-lora-7b \
--tokenizer_model_name_or_path sl-alex/llama-7b-alpaca-stepwise-lora-embtuned \
--lora_model_name_or_path sl-alex/llama-7b-alpaca-stepwise-lora-embtuned \
--input_embedding_path embed_tokens.pt \
--output_embedding_path lm_head.pt \
--use_bos_token_in_prompt \
--overrun_countermeasures False

Prompt style

The model is finetuned for instruction-response.

prompt = '<s>Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nHow do I feed a family of 5 without ingredients?\n\n### Response:\n<|step_start|>'

Example output:

$ What is the third Fibonacci number?
I recall that the Fibonacci numbers are sequences that start with 0 and 1, and then each term is the sum of the previous two candidates, such as 0, 1, 1, 2, 3, 5, 8, 13, and so on. <|step_end|> <|step_start|> To find the next Fibonacci number, I can use a calculator or a table of the first few numbers, but I also notice that there is a pattern in the sequence that seems to undo the previous number and add back the previous one after each cycle. <|step_end|> <|step_start|> I also notice that the cycle length is increasing by 1 each time, so that after 10 cycles, the next Fibonacci number would be 0 again, since 1 + the previous cycle length would be 0. <|step_end|> <|step_start|> Therefore, to find the third Fibonacci number, I can divide the sequence into three equal parts and use the partial sums of the first two parts as the first two Fibonacci numbers, then use the third partial sum as the third Fibonacci number. <|step_end|> <|step_start|> The first two partial sums are 0 and 1, so the first two Fibonacci numbers are 0

sl-alex
/

llama-7b-alpaca-stepwise-lora-embtuned

Model description

Training

Usage

Prompt style

Example output:

Dataset used to train sl-alex/llama-7b-alpaca-stepwise-lora-embtuned