German
SebastianBodza commited on
Commit
a7a4a14
1 Parent(s): 1967b9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -51,6 +51,8 @@ txt = model.generate(**txt,
51
  eos_token_id=tokenizer.eos_token_id)
52
  tokenizer.decode(txt[0], skip_special_tokens=True)
53
  ```
 
 
54
  ## Training:
55
  Training was based on Llama-X with the adaptions of WizardLMs training script and additional adjustments to QLoRa tune. MPT-Code from <a href="https://huggingface.co/SebastianBodza/mpt-30B-qlora-multi_GPU">SebastianBodza/mpt-30B-qlora-multi_GPU</a>
56
 
 
51
  eos_token_id=tokenizer.eos_token_id)
52
  tokenizer.decode(txt[0], skip_special_tokens=True)
53
  ```
54
+ ## Limitations:
55
+ Gradient-Accumulation led to divergence after a couple of steps. Therefore we reduced the blocksize to 1024 and used two RTX 3090 to get a BS of 4. Probably too small to generalize well.
56
  ## Training:
57
  Training was based on Llama-X with the adaptions of WizardLMs training script and additional adjustments to QLoRa tune. MPT-Code from <a href="https://huggingface.co/SebastianBodza/mpt-30B-qlora-multi_GPU">SebastianBodza/mpt-30B-qlora-multi_GPU</a>
58