adamo1139
/

Yi-34B-200K-AEZAKMI-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

adamo1139 commited on Dec 14, 2023

Commit

a5bf5bc

•

1 Parent(s): 02050a7

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -13,6 +13,8 @@ Prompt format is standard chatml. Don't expect it to be good at math, riddles or
 Cost of this fine-tune is about $10 in electricity. It took me 3 tries to get it right.
 Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh.
 ## Prompt Format

 Cost of this fine-tune is about $10 in electricity. It took me 3 tries to get it right.
 Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh.
+I had to change max_positional_embeddings in config.json and model_max_length to 4096 for training to start, otherwise I was OOMing straight away.
+My first attempt had max_positional_embeddings set to 16384 and model_max_length set to 200000. This allowed fine-tuning to finish, but model was broken after applying LoRA and merging it.
 ## Prompt Format