jartine commited on
Commit
7b8c76f
1 Parent(s): 9142874

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -7
README.md CHANGED
@@ -24,7 +24,10 @@ quantized_by: jartine
24
 
25
  This is a large language model that was released by Meta on 2024-07-23.
26
  As of its release date, this is the largest and most complex open
27
- weights model available.
 
 
 
28
 
29
  - Model creator: [Meta](https://huggingface.co/meta-llama/)
30
  - Original model: [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)
@@ -37,12 +40,21 @@ FreeBSD, OpenBSD and NetBSD systems you control on both AMD64 and ARM64.
37
  ## Quickstart
38
 
39
  Running the following on a desktop OS will launch a tab in your web
40
- browser.
 
 
 
 
41
 
42
  ```
43
- wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q3_K_M.llamafile
44
- chmod +x Meta-Llama-3.1-405B.Q3_K_M.llamafile
45
- ./Meta-Llama-3.1-405B.Q3_K_M.llamafile
 
 
 
 
 
46
  ```
47
 
48
  You can then use the completion mode of the GUI to experiment with this
@@ -53,9 +65,10 @@ model. You can prompt the model for completions on the command line too:
53
  ```
54
 
55
  This model has a max context window size of 128k tokens. By default, a
56
- context window size of 2048 tokens is used. You can use a larger context
57
  window by passing the `-c 8192` flag. The software currently has
58
- limitations that may prevent scaling to the full 128k size.
 
59
 
60
  On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
61
  the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
 
24
 
25
  This is a large language model that was released by Meta on 2024-07-23.
26
  As of its release date, this is the largest and most complex open
27
+ weights model available. This is the base model. It hasn't been fine
28
+ tuned to follow your instructions. See also
29
+ [Meta-Llama-3.1-405B-Instruct-llamafile](https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-Instruct-llamafile)
30
+ for a friendlier and more useful version of this model.
31
 
32
  - Model creator: [Meta](https://huggingface.co/meta-llama/)
33
  - Original model: [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)
 
40
  ## Quickstart
41
 
42
  Running the following on a desktop OS will launch a tab in your web
43
+ browser. The smallest weights available are are Q2\_K which should work
44
+ fine on systems with at least 150 GB of RAM. This llamafile needs to be
45
+ downloaded in multiple files, due to HuggingFace's 50GB upload limit and
46
+ then concatenated back together locally. Therefore you'll need at least
47
+ 400GB of free disk space.
48
 
49
  ```
50
+ wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat0.llamafile
51
+ wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat1.llamafile
52
+ wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat2.llamafile
53
+ wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat3.llamafile
54
+ cat Meta-Llama-3.1-405B.Q2_K.cat{0,1,2,3}.llamafile >Meta-Llama-3.1-405B.Q2_K.llamafile
55
+ rm Meta-Llama-3.1-405B.Q2_K.cat*.llamafile
56
+ chmod +x Meta-Llama-3.1-405B.Q2_K.llamafile
57
+ ./Meta-Llama-3.1-405B.Q2_K.llamafile
58
  ```
59
 
60
  You can then use the completion mode of the GUI to experiment with this
 
65
  ```
66
 
67
  This model has a max context window size of 128k tokens. By default, a
68
+ context window size of 4096 tokens is used. You can use a larger context
69
  window by passing the `-c 8192` flag. The software currently has
70
+ limitations in its llama v3.1 support that may prevent scaling to the
71
+ full 128k size.
72
 
73
  On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
74
  the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card