Commit ·
e3cb34d
1
Parent(s): fcbe832
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,81 +0,0 @@
|
|
| 1 |
-
# Chat with Meta's LLaMA models at home made easy
|
| 2 |
-
|
| 3 |
-
This repository is a chat example with [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) ([arXiv](https://arxiv.org/abs/2302.13971v1)) models running on a typical home PC. You will just need a NVIDIA videocard and some RAM to chat with model.
|
| 4 |
-
|
| 5 |
-
This repo is heavily based on Meta's original repo: https://github.com/facebookresearch/llama
|
| 6 |
-
|
| 7 |
-
And on Venuatu's repo: https://github.com/venuatu/llama
|
| 8 |
-
|
| 9 |
-
### Examples of chats here
|
| 10 |
-
|
| 11 |
-
https://github.com/facebookresearch/llama/issues/162
|
| 12 |
-
|
| 13 |
-
### System requirements
|
| 14 |
-
- Modern enough CPU
|
| 15 |
-
- NVIDIA graphics card
|
| 16 |
-
- 64 or better 128 Gb of RAM (192 or 256 would be perfect)
|
| 17 |
-
|
| 18 |
-
One may run with 32 Gb of RAM, but inference will be slow (with the speed of your swap file reading)
|
| 19 |
-
|
| 20 |
-
I am running this on 12700k/128 Gb RAM/NVIDIA 3070ti 8Gb/fast huge nvme and getting one token from 30B model in a few seconds.
|
| 21 |
-
|
| 22 |
-
For example, 30B model uses around 70 Gb of RAM.
|
| 23 |
-
|
| 24 |
-
If you do not have powerful videocard, you may use another repo for cpu-only inference: https://github.com/randaller/llama-cpu
|
| 25 |
-
|
| 26 |
-
### Conda Environment Setup Example for Windows 10+
|
| 27 |
-
Download and install Anaconda Python https://www.anaconda.com and run Anaconda Prompt
|
| 28 |
-
```
|
| 29 |
-
conda create -n llama python=3.10
|
| 30 |
-
conda activate llama
|
| 31 |
-
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
### Setup
|
| 35 |
-
In a conda env with pytorch / cuda available, run
|
| 36 |
-
```
|
| 37 |
-
pip install -r requirements.txt
|
| 38 |
-
```
|
| 39 |
-
Then in this repository
|
| 40 |
-
```
|
| 41 |
-
pip install -e .
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
### Download tokenizer and models
|
| 45 |
-
magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
|
| 46 |
-
|
| 47 |
-
or
|
| 48 |
-
|
| 49 |
-
magnet:xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
|
| 50 |
-
|
| 51 |
-
### Prepare model
|
| 52 |
-
|
| 53 |
-
First, you need to unshard model checkpoints to a single file. Let's do this for 30B model.
|
| 54 |
-
|
| 55 |
-
```
|
| 56 |
-
python merge-weights.py --input_dir D:\Downloads\LLaMA --model_size 30B
|
| 57 |
-
```
|
| 58 |
-
|
| 59 |
-
In this example, D:\Downloads\LLaMA is a root folder of downloaded torrent with weights.
|
| 60 |
-
|
| 61 |
-
This will create merged.pth file in the root folder of this repo.
|
| 62 |
-
|
| 63 |
-
Place this file and corresponding (torrentroot)/30B/params.json of model into [/model] folder.
|
| 64 |
-
|
| 65 |
-
So you should end up with two files in [/model] folder: merged.pth and params.json.
|
| 66 |
-
|
| 67 |
-
Place (torrentroot)/tokenizer.model file to the [/tokenizer] folder of this repo. Now you are ready to go.
|
| 68 |
-
|
| 69 |
-
### Run the chat
|
| 70 |
-
|
| 71 |
-
```
|
| 72 |
-
python example-chat.py ./model ./tokenizer/tokenizer.model
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
### Enable multi-line answers
|
| 76 |
-
|
| 77 |
-
If you wish to stop generation not by "\n" sign, but by another signature, like "User:" (which is also good idea), or any other, make the following modification in the llama/generation.py:
|
| 78 |
-
|
| 79 |
-

|
| 80 |
-
|
| 81 |
-
-5 means to remove last 5 chars from resulting context, which is length of your stop signature, "User:" in this example.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|