andrijdavid commited on
Commit
feeda79
1 Parent(s): 7450d23

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. .gitattributes +18 -0
  2. README.md +6 -6
.gitattributes CHANGED
@@ -37,3 +37,21 @@ tinyfrank-f16.gguf filter=lfs diff=lfs merge=lfs -text
37
  tinyfrank-q2L.gguf filter=lfs diff=lfs merge=lfs -text
38
  tinyfrank-q4L.gguf filter=lfs diff=lfs merge=lfs -text
39
  tinyfrank-q6L.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  tinyfrank-q2L.gguf filter=lfs diff=lfs merge=lfs -text
38
  tinyfrank-q4L.gguf filter=lfs diff=lfs merge=lfs -text
39
  tinyfrank-q6L.gguf filter=lfs diff=lfs merge=lfs -text
40
+ tinyfrank-1.4B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
41
+ tinyfrank-1.4B-Q3_K.gguf filter=lfs diff=lfs merge=lfs -text
42
+ tinyfrank-1.4B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
43
+ tinyfrank-1.4B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
44
+ tinyfrank-1.4B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
45
+ tinyfrank-1.4B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
46
+ tinyfrank-1.4B-Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
47
+ tinyfrank-1.4B-Q4_K.gguf filter=lfs diff=lfs merge=lfs -text
48
+ tinyfrank-1.4B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
49
+ tinyfrank-1.4B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
50
+ tinyfrank-1.4B-Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
51
+ tinyfrank-1.4B-Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
52
+ tinyfrank-1.4B-Q5_K.gguf filter=lfs diff=lfs merge=lfs -text
53
+ tinyfrank-1.4B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
54
+ tinyfrank-1.4B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
55
+ tinyfrank-1.4B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
56
+ tinyfrank-1.4B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
57
+ tinyfrank-1.4B-f16.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -58,7 +58,7 @@ The following clients/libraries will automatically download models for you, prov
58
 
59
  ### In `text-generation-webui`
60
 
61
- Under Download Model, you can enter the model repo: andrijdavid/tinyfrank-1.4B-GGUF and below it, a specific filename to download, such as: tinyfrank-1.4B.gguf.
62
 
63
  Then click Download.
64
 
@@ -73,7 +73,7 @@ pip3 install huggingface-hub
73
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
74
 
75
  ```shell
76
- huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B.gguf --local-dir . --local-dir-use-symlinks False
77
  ```
78
 
79
  <details>
@@ -96,7 +96,7 @@ pip3 install hf_transfer
96
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
97
 
98
  ```shell
99
- HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B.gguf --local-dir . --local-dir-use-symlinks False
100
  ```
101
 
102
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
@@ -108,7 +108,7 @@ Windows Command Line users: You can set the environment variable by running `set
108
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
109
 
110
  ```shell
111
- ./main -ngl 35 -m tinyfrank-1.4B.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<PROMPT>"
112
  ```
113
 
114
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
@@ -159,7 +159,7 @@ pip install llama-cpp-python
159
  from llama_cpp import Llama
160
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
161
  llm = Llama(
162
- model_path="./tinyfrank-1.4B.gguf", # Download the model file first
163
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
164
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
165
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
@@ -172,7 +172,7 @@ output = llm(
172
  echo=True # Whether to echo the prompt
173
  )
174
  # Chat Completion API
175
- llm = Llama(model_path="./tinyfrank-1.4B.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
176
  llm.create_chat_completion(
177
  messages = [
178
  {"role": "system", "content": "You are a story writing assistant."},
 
58
 
59
  ### In `text-generation-webui`
60
 
61
+ Under Download Model, you can enter the model repo: andrijdavid/tinyfrank-1.4B-GGUF and below it, a specific filename to download, such as: tinyfrank-1.4B-f16.gguf.
62
 
63
  Then click Download.
64
 
 
73
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
74
 
75
  ```shell
76
+ huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B-f16.gguf --local-dir . --local-dir-use-symlinks False
77
  ```
78
 
79
  <details>
 
96
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
97
 
98
  ```shell
99
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B-f16.gguf --local-dir . --local-dir-use-symlinks False
100
  ```
101
 
102
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
 
108
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
109
 
110
  ```shell
111
+ ./main -ngl 35 -m tinyfrank-1.4B-f16.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<PROMPT>"
112
  ```
113
 
114
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
 
159
  from llama_cpp import Llama
160
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
161
  llm = Llama(
162
+ model_path="./tinyfrank-1.4B-f16.gguf", # Download the model file first
163
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
164
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
165
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
 
172
  echo=True # Whether to echo the prompt
173
  )
174
  # Chat Completion API
175
+ llm = Llama(model_path="./tinyfrank-1.4B-f16.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
176
  llm.create_chat_completion(
177
  messages = [
178
  {"role": "system", "content": "You are a story writing assistant."},