chavinlo
/

alpaca-native

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chavinlo commited on Mar 20, 2023

Commit

d689ebe

·

1 Parent(s): 30b0f97

Update README.md

Files changed (1) hide show

README.md +12 -6

README.md CHANGED Viewed

@@ -1,12 +1,18 @@
-Stanford Alpaca
-FINETUNED USING THE ORIGINAL REPOSITORY: https://github.com/tatsu-lab/stanford_alpaca
-NO LORA HAS BEEN USED
-full model 3 epochs og data
-CONFIGURATION (default):
 ```shell
 torchrun --nproc_per_node=4 --master_port=3045 train.py \
@@ -27,7 +33,7 @@ torchrun --nproc_per_node=4 --master_port=3045 train.py \
     --warmup_ratio 0.03 \
     --lr_scheduler_type "cosine" \
     --logging_steps 1 \
-    --fsdp "full_shard auto_wrap" \
     --fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
     --tf32 True --report_to="wandb"
 ```

+# Stanford Alpaca
+This is a replica of Alpaca by Stanford' tatsu
+Trained using the original instructions with a minor modification in FSDP mode
+Trained on 4xA100s for 6H
+NO LORA HAS BEEN USED, this is a natively-finetuned model, hence "alpaca-native"
+If you are interested on more llama-based models, you can check out my profile or search for other models at https://huggingface.co/models?other=llama
+This (MIGHT) be a quantized version of this model, but be careful: https://boards.4channel.org/g/thread/92173062#p92182396
+CONFIGURATION (default except fsdp):
 ```shell
 torchrun --nproc_per_node=4 --master_port=3045 train.py \
     --warmup_ratio 0.03 \
     --lr_scheduler_type "cosine" \
     --logging_steps 1 \
+    --fsdp "shard_grad_op auto_wrap" \
     --fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
     --tf32 True --report_to="wandb"
 ```