chavinlo commited on
Commit
d689ebe
1 Parent(s): 30b0f97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -1,12 +1,18 @@
1
- Stanford Alpaca
2
 
3
- FINETUNED USING THE ORIGINAL REPOSITORY: https://github.com/tatsu-lab/stanford_alpaca
4
 
5
- NO LORA HAS BEEN USED
6
 
7
- full model 3 epochs og data
8
 
9
- CONFIGURATION (default):
 
 
 
 
 
 
10
 
11
  ```shell
12
  torchrun --nproc_per_node=4 --master_port=3045 train.py \
@@ -27,7 +33,7 @@ torchrun --nproc_per_node=4 --master_port=3045 train.py \
27
  --warmup_ratio 0.03 \
28
  --lr_scheduler_type "cosine" \
29
  --logging_steps 1 \
30
- --fsdp "full_shard auto_wrap" \
31
  --fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
32
  --tf32 True --report_to="wandb"
33
  ```
 
1
+ # Stanford Alpaca
2
 
3
+ This is a replica of Alpaca by Stanford' tatsu
4
 
5
+ Trained using the original instructions with a minor modification in FSDP mode
6
 
7
+ Trained on 4xA100s for 6H
8
 
9
+ NO LORA HAS BEEN USED, this is a natively-finetuned model, hence "alpaca-native"
10
+
11
+ If you are interested on more llama-based models, you can check out my profile or search for other models at https://huggingface.co/models?other=llama
12
+
13
+ This (MIGHT) be a quantized version of this model, but be careful: https://boards.4channel.org/g/thread/92173062#p92182396
14
+
15
+ CONFIGURATION (default except fsdp):
16
 
17
  ```shell
18
  torchrun --nproc_per_node=4 --master_port=3045 train.py \
 
33
  --warmup_ratio 0.03 \
34
  --lr_scheduler_type "cosine" \
35
  --logging_steps 1 \
36
+ --fsdp "shard_grad_op auto_wrap" \
37
  --fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
38
  --tf32 True --report_to="wandb"
39
  ```