pankajmathur
/

orca_alpaca_3b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Pankaj Mathur commited on Jun 20, 2023

Commit

58be4f8

•

1 Parent(s): 8a481b4

Update README.md

Files changed (1) hide show

README.md +15 -9

README.md CHANGED Viewed

@@ -1,26 +1,32 @@
 # alpaca_orca_open_llama: An Open_LLaMA-3B model trained on Alpaca dataset using Orca Research paper approaches
 # Dataset and Training
-We train OpenLLaMa-3B model on the custom Alpaca dataset created using Orca Research Paper approaches.
-Please pay attention how System prompt is added and used for each instruction.
 The training configurations are provided in the table below.
-The training takes on 4 x A600(50G) GPUs and lasts for around 20 Hours for cost of $66.
 We used DeepSpeed with Zero-3 approaches for parallel gpu training.
 |||
 |:-------------:|:-------------:|
-|**Batch Size**|16|
-|**train_micro_batch_size_per_gpu**|2|
-|**gradient_accumulation_steps**|2|
-|**Learning rate**|2e-5|
-|**Epochs**|3|
-|**Max length**|1024|

+---
+license: mit
+language:
+- en
+library_name: adapter-transformers
+---
 # alpaca_orca_open_llama: An Open_LLaMA-3B model trained on Alpaca dataset using Orca Research paper approaches
 # Dataset and Training
+We train OpenLLaMa-3B model to become more steerable by training it on the custom Alpaca dataset created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
+Please pay attention how the **System** prompt is added before each *instruction*.
 The training configurations are provided in the table below.
+The training takes on 4x A600(50G) GPUs and lasts for around 20 Hours for cost of $66 using [Lambda Labs](https://lambdalabs.com)
 We used DeepSpeed with Zero-3 approaches for parallel gpu training.
 |||
 |:-------------:|:-------------:|
+|*Batch Size*|16|
+|*train_micro_batch_size_per_gpu*|2|
+|*gradient_accumulation_steps*|2|
+|*Learning rate*|2e-5|
+|*Epochs*|3|
+|*Max length*|1024|