adamo1139 commited on
Commit
b7272fd
1 Parent(s): cd4124c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -13,10 +13,9 @@ Prompt format is standard chatml. Don't expect it to be good at math, riddles or
13
  Cost of this fine-tune is about $10 in electricity. It took me 3 tries to get it right.
14
  Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh.
15
 
16
- I had to change max_positional_embeddings in config.json and model_max_length to 4096 for training to start, otherwise I was OOMing straight away.
17
  My first attempt had max_positional_embeddings set to 16384 and model_max_length set to 200000. This allowed fine-tuning to finish, but that model was broken after applying LoRA and merging it. \
18
- Please use my lesson and be careful when setting positional embeddings for training \
19
- <b>This model is my third attempt with AEZAKMI v2 dataset and it works perfectly fine.</b>
20
 
21
  ## Prompt Format
22
 
 
13
  Cost of this fine-tune is about $10 in electricity. It took me 3 tries to get it right.
14
  Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh.
15
 
16
+ I had to lower max_positional_embeddings in config.json and model_max_length for training to start, otherwise I was OOMing straight away.
17
  My first attempt had max_positional_embeddings set to 16384 and model_max_length set to 200000. This allowed fine-tuning to finish, but that model was broken after applying LoRA and merging it. \
18
+ This attempt had both max_position_embeddings and model_max_length set to 4096, which worked perfectly fine.
 
19
 
20
  ## Prompt Format
21