Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,10 @@ quantized_by: jartine
|
|
24 |
|
25 |
This is a large language model that was released by Meta on 2024-07-23.
|
26 |
As of its release date, this is the largest and most complex open
|
27 |
-
weights model available.
|
|
|
|
|
|
|
28 |
|
29 |
- Model creator: [Meta](https://huggingface.co/meta-llama/)
|
30 |
- Original model: [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)
|
@@ -37,12 +40,21 @@ FreeBSD, OpenBSD and NetBSD systems you control on both AMD64 and ARM64.
|
|
37 |
## Quickstart
|
38 |
|
39 |
Running the following on a desktop OS will launch a tab in your web
|
40 |
-
browser.
|
|
|
|
|
|
|
|
|
41 |
|
42 |
```
|
43 |
-
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.
|
44 |
-
|
45 |
-
|
|
|
|
|
|
|
|
|
|
|
46 |
```
|
47 |
|
48 |
You can then use the completion mode of the GUI to experiment with this
|
@@ -53,9 +65,10 @@ model. You can prompt the model for completions on the command line too:
|
|
53 |
```
|
54 |
|
55 |
This model has a max context window size of 128k tokens. By default, a
|
56 |
-
context window size of
|
57 |
window by passing the `-c 8192` flag. The software currently has
|
58 |
-
limitations that may prevent scaling to the
|
|
|
59 |
|
60 |
On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
|
61 |
the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
|
|
|
24 |
|
25 |
This is a large language model that was released by Meta on 2024-07-23.
|
26 |
As of its release date, this is the largest and most complex open
|
27 |
+
weights model available. This is the base model. It hasn't been fine
|
28 |
+
tuned to follow your instructions. See also
|
29 |
+
[Meta-Llama-3.1-405B-Instruct-llamafile](https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-Instruct-llamafile)
|
30 |
+
for a friendlier and more useful version of this model.
|
31 |
|
32 |
- Model creator: [Meta](https://huggingface.co/meta-llama/)
|
33 |
- Original model: [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)
|
|
|
40 |
## Quickstart
|
41 |
|
42 |
Running the following on a desktop OS will launch a tab in your web
|
43 |
+
browser. The smallest weights available are are Q2\_K which should work
|
44 |
+
fine on systems with at least 150 GB of RAM. This llamafile needs to be
|
45 |
+
downloaded in multiple files, due to HuggingFace's 50GB upload limit and
|
46 |
+
then concatenated back together locally. Therefore you'll need at least
|
47 |
+
400GB of free disk space.
|
48 |
|
49 |
```
|
50 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat0.llamafile
|
51 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat1.llamafile
|
52 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat2.llamafile
|
53 |
+
wget https://huggingface.co/Mozilla/Meta-Llama-3.1-405B-llamafile/resolve/main/Meta-Llama-3.1-405B.Q2_K.cat3.llamafile
|
54 |
+
cat Meta-Llama-3.1-405B.Q2_K.cat{0,1,2,3}.llamafile >Meta-Llama-3.1-405B.Q2_K.llamafile
|
55 |
+
rm Meta-Llama-3.1-405B.Q2_K.cat*.llamafile
|
56 |
+
chmod +x Meta-Llama-3.1-405B.Q2_K.llamafile
|
57 |
+
./Meta-Llama-3.1-405B.Q2_K.llamafile
|
58 |
```
|
59 |
|
60 |
You can then use the completion mode of the GUI to experiment with this
|
|
|
65 |
```
|
66 |
|
67 |
This model has a max context window size of 128k tokens. By default, a
|
68 |
+
context window size of 4096 tokens is used. You can use a larger context
|
69 |
window by passing the `-c 8192` flag. The software currently has
|
70 |
+
limitations in its llama v3.1 support that may prevent scaling to the
|
71 |
+
full 128k size.
|
72 |
|
73 |
On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
|
74 |
the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
|