NickyNicky
/

Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NickyNicky commited on Oct 17, 2023

Commit

93ba188

•

1 Parent(s): 005361e

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -54,6 +54,15 @@ reference-data-model:
         https://github.com/tomaarsen/attention_sinks
         https://arxiv.org/abs/2309.17453
   Version:
     - Link:
         https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v1
@@ -113,6 +122,7 @@ model = AutoModelForCausalLM.from_pretrained(model_id,
                                              torch_dtype=torch.bfloat16,
                                              load_in_4bit=True,
                                              low_cpu_mem_usage= True,
                                              attention_sink_size=4,
                                              attention_sink_window_size=1024, #512, # <- Low for the sake of faster generation

         https://github.com/tomaarsen/attention_sinks
         https://arxiv.org/abs/2309.17453
+  TRL:
+    - Link:
+        https://huggingface.co/docs/trl/index
+        https://huggingface.co/docs/trl/sft_trainer
+  flash-attention:
+        https://github.com/Dao-AILab/flash-attention
+        https://arxiv.org/abs/2205.14135
   Version:
     - Link:
         https://huggingface.co/NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v1
                                              torch_dtype=torch.bfloat16,
                                              load_in_4bit=True,
                                              low_cpu_mem_usage= True,
+                                             #use_flash_attention_2=True, #GPU A100 or GPU supported
                                              attention_sink_size=4,
                                              attention_sink_window_size=1024, #512, # <- Low for the sake of faster generation