Locutusque
/

Llama-3-Hercules-5.0-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Locutusque commited on May 21

Commit

e66b659

•

1 Parent(s): 8dd987b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ Llama-3-Hercules-5.0-8B is well-suited to the following applications:
 - This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
 - A learning rate of 2e-5 with the Adam optimizer. A linear scheduler was used, with an end factor of 0.005.
 - No mixed precision was used, with the default dtype being bfloat16.
-- A total batch size of 64 was used.
 - Trained on all examples of Hercules-v5.0 for 2 epochs
 - No model parameters were frozen and no quantization was used.
 - This model was trained on OpenAI's ChatML prompt format. Because this model has function calling capabilities, the prompt format is slightly different, here's what it would look like: ```<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{user message}<|im_end|>\n<|im_start|>call\n{function call message}<|im_end|>\n<|im_start|>function\n{function response message}<|im_end|>\n<|im_start|>assistant\n{assistant message}</s>```

 - This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
 - A learning rate of 2e-5 with the Adam optimizer. A linear scheduler was used, with an end factor of 0.005.
 - No mixed precision was used, with the default dtype being bfloat16.
+- A total batch size of 128 was used.
 - Trained on all examples of Hercules-v5.0 for 2 epochs
 - No model parameters were frozen and no quantization was used.
 - This model was trained on OpenAI's ChatML prompt format. Because this model has function calling capabilities, the prompt format is slightly different, here's what it would look like: ```<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{user message}<|im_end|>\n<|im_start|>call\n{function call message}<|im_end|>\n<|im_start|>function\n{function response message}<|im_end|>\n<|im_start|>assistant\n{assistant message}</s>```