Finetunes Llama-7b+Alpaca to solve problems via stepwise reasoning (OpenAI PRM800k dataset, or rather our postprocessed version, sl-alex/openai-prm800k-solutions-only).

Model description

This is a fork of llama-7b + tloen/alpaca-lora-7b.

That is: we loaded Llama-7b, we applied Alpaca LoRA, expanded vocabulary, then QLoRA 4-bit finetuned from there.

Parts:

  • base model llama-7b
  • LoRA 0 tloen/alpaca-lora-7b
  • LoRA 1
    • adapter_config.json
    • adapter_model.bin
  • tokenizer
    • added_tokens.json
    • special_tokens_map.json
    • tokenizer.model
    • tokenizer_config.json
  • finetuned input/output embedding layers:
    • embed_tokens.pt (state_dict for model.get_input_embeddings(), embed_tokens: Embedding)
    • lm_head.pt (state_dict for model.get_output_embeddings(), lm_head: Linear)

Training

Trained using qlora.py from our stepwise branch of qlora.
Known-good as of commit 3a86919.

python -m qlora --model_name_or_path huggyllama/llama-7b --lora_name_or_path tloen/alpaca-lora-7b --dataset prm800k-solutions --dataset_format prm800k-solutions --bf16 --max_memory_MB 24000 --use_bos_token_in_prompt --truncate_toward_center --source_max_len 184 --target_max_len 998 --gradient_accumulation_steps 4 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --learning_rate 0.0002 --run_name 13b_alpaca_special_tokens_long --report_to wandb --save_steps 64 --save_total_limit 3 --max_steps 1664 --evaluation_strategy steps --eval_steps 64 --generate_steps 16 --register_process_supervision_tokens

Usage

You can load using evaluate.py from our stepwise branch of qlora.
Known-good as of commit 3a86919.

You'll need to download embed_tokens.pt and lm_head.pt from this repository, and ensure they are saved to the root of the qlora repository, then run evaluate.py like so:

git clone -b stepwise https://github.com/scottlogic-alex/qlora.git
cd qlora
wget https://huggingface.co/sl-alex/llama-7b-alpaca-stepwise-lora-embtuned/resolve/main/embed_tokens.pt
wget https://huggingface.co/sl-alex/llama-7b-alpaca-stepwise-lora-embtuned/resolve/main/lm_head.pt
python -m evaluate \
--model_name_or_path huggyllama/llama-7b \
--base_lora_model_name_or_path tloen/alpaca-lora-7b \
--tokenizer_model_name_or_path sl-alex/llama-7b-alpaca-stepwise-lora-embtuned \
--lora_model_name_or_path sl-alex/llama-7b-alpaca-stepwise-lora-embtuned \
--input_embedding_path embed_tokens.pt \
--output_embedding_path lm_head.pt \
--use_bos_token_in_prompt \
--overrun_countermeasures False

Prompt style

The model is finetuned for instruction-response.

prompt = '<s>Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nHow do I feed a family of 5 without ingredients?\n\n### Response:\n<|step_start|>'

Example output:

$ How do I drink a matcha latte?
I remember that one of my friends who is into this kind of thing said that he mixes equal parts of matcha and milk in a glass and chocolate. <|step_end|> <|step_start|> Then he adds three times as much ice as there is liquid. <|step_end|> <|step_start|> Then he stretches his glass by filling it with a rubber band and placing his glass on top of his fist, with his elbow on the table. <|step_end|> <|step_start|> He then walks down the street, turns around and back, and throws the glass into the air and runs. <|step_end|> <|step_start|> This makes the glass rotate on its axis and shake as the ice hits the inside and the glass hits the ground. <|step_end|> <|step_start|> Then he gets the glass and drinks out of it without removing his elbow from the table. <|step_end|> <|step_start|> I think that is a good way to eat a matcha latte, but I need to check it out with more friends and see if their ways are different. <|step_end|> <|step_start|> Then I can compare them and write a paper on the subject. <|answer_start|> I check out this way with my six friends and write a paper on the subject. I find that four of them think that this is a good way to eat a matcha latte. The other two think that I am nuts

$ What is the third Fibonacci number?
I recall that the Fibonacci numbers are sequences that start with 0 and 1, and then each term is the sum of the previous two candidates, such as 0, 1, 1, 2, 3, 5, 8, 13, and so on. <|step_end|> <|step_start|> To find the next Fibonacci number, I can use a calculator or a table of the first few numbers, but I also notice that there is a pattern in the sequence that seems to undo the previous number and add back the previous one after each cycle. <|step_end|> <|step_start|> I also notice that the cycle length is increasing by 1 each time, so that after 10 cycles, the next Fibonacci number would be 0 again, since 1 + the previous cycle length would be 0. <|step_end|> <|step_start|> Therefore, to find the third Fibonacci number, I can divide the sequence into three equal parts and use the partial sums of the first two parts as the first two Fibonacci numbers, then use the third partial sum as the third Fibonacci number. <|step_end|> <|step_start|> The first two partial sums are 0 and 1, so the first two Fibonacci numbers are 0

$ How do I make a cup of tea?
I need to find a teapot, a cup, and some tea. <|step_end|> <|step_start|> I can start by looking around the house and see if any of my parents have old cups and teapots. <|step_end|> <|step_start|> If I can't find any, I can look in my local markets and see if I can find them there. <|step_end|> <|step_start|> I have to remember to buy some tea, too. <|step_end|> <|step_start|> What kind of tea should I get? I remember seeing some jasmine tea in one of the markets. <|step_end|> <|step_start|> I think I'll check if they sell that. <|step_end|> <|step_start|> Another thing I need to do is make sure the teapot and the cup are suitable for each other. <|step_end|> <|step_start|> There should be some kind of fit between them, or else the tea might get lost. <|step_end|> <|step_start|> I think 10 cups should be enough for one teapot. <|step_end|> <|step_start|> One cup should be enough for one teapot, don't you think? <|step_end|> <|step_start|> But also, one teapot should be enough for one cup, right? <|step_end|> <|step_start|> I think I'm getting messed up in some set theory. <|step_end|> <|step_start|> That's true, we have to remember that the teapot can also be considered a

$ How do I open a jar?
I need to find a jar that is easy for me to open. <|step_end|> <|step_start|> I notice that some jar lids have a quick-opening button that makes the jar easier to open by increasing the area of the jar where the glass is connected to the Metal. <|step_end|> <|step_start|> I also notice that some jar lids have a fuse-and-ignite button that makes the jar easier to open by applying an energy source to the glass that increases the temperature and makes it easier to fuse. <|step_end|> <|step_start|> I think about what happens when I apply a quick-opening button or a fuse-and-ignite button to a jar that is already hard to open. <|step_end|> <|step_start|> I know that the quick-opening button will not make the jar easier to open if the jar is already hard to open. <|step_end|> <|step_start|> It will only make the jar easier to open if the jar is already easy to open. <|step_end|> <|step_start|> Similarly, I know that the fuse-and-ignite button will not make the jar easier to open if the jar is already hard to open. <|step_end|> <|step_start|> It will only make the jar easier to open if the jar is already easy to open. <|step_end|> <|step_start|> So, I need to find a jar that is not hard to open in the first

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train sl-alex/llama-7b-alpaca-stepwise-lora-embtuned