breadlicker45 commited on
Commit
e3cb34d
·
1 Parent(s): fcbe832

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -81
README.md CHANGED
@@ -1,81 +0,0 @@
1
- # Chat with Meta's LLaMA models at home made easy
2
-
3
- This repository is a chat example with [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) ([arXiv](https://arxiv.org/abs/2302.13971v1)) models running on a typical home PC. You will just need a NVIDIA videocard and some RAM to chat with model.
4
-
5
- This repo is heavily based on Meta's original repo: https://github.com/facebookresearch/llama
6
-
7
- And on Venuatu's repo: https://github.com/venuatu/llama
8
-
9
- ### Examples of chats here
10
-
11
- https://github.com/facebookresearch/llama/issues/162
12
-
13
- ### System requirements
14
- - Modern enough CPU
15
- - NVIDIA graphics card
16
- - 64 or better 128 Gb of RAM (192 or 256 would be perfect)
17
-
18
- One may run with 32 Gb of RAM, but inference will be slow (with the speed of your swap file reading)
19
-
20
- I am running this on 12700k/128 Gb RAM/NVIDIA 3070ti 8Gb/fast huge nvme and getting one token from 30B model in a few seconds.
21
-
22
- For example, 30B model uses around 70 Gb of RAM.
23
-
24
- If you do not have powerful videocard, you may use another repo for cpu-only inference: https://github.com/randaller/llama-cpu
25
-
26
- ### Conda Environment Setup Example for Windows 10+
27
- Download and install Anaconda Python https://www.anaconda.com and run Anaconda Prompt
28
- ```
29
- conda create -n llama python=3.10
30
- conda activate llama
31
- conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
32
- ```
33
-
34
- ### Setup
35
- In a conda env with pytorch / cuda available, run
36
- ```
37
- pip install -r requirements.txt
38
- ```
39
- Then in this repository
40
- ```
41
- pip install -e .
42
- ```
43
-
44
- ### Download tokenizer and models
45
- magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
46
-
47
- or
48
-
49
- magnet:xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
50
-
51
- ### Prepare model
52
-
53
- First, you need to unshard model checkpoints to a single file. Let's do this for 30B model.
54
-
55
- ```
56
- python merge-weights.py --input_dir D:\Downloads\LLaMA --model_size 30B
57
- ```
58
-
59
- In this example, D:\Downloads\LLaMA is a root folder of downloaded torrent with weights.
60
-
61
- This will create merged.pth file in the root folder of this repo.
62
-
63
- Place this file and corresponding (torrentroot)/30B/params.json of model into [/model] folder.
64
-
65
- So you should end up with two files in [/model] folder: merged.pth and params.json.
66
-
67
- Place (torrentroot)/tokenizer.model file to the [/tokenizer] folder of this repo. Now you are ready to go.
68
-
69
- ### Run the chat
70
-
71
- ```
72
- python example-chat.py ./model ./tokenizer/tokenizer.model
73
- ```
74
-
75
- ### Enable multi-line answers
76
-
77
- If you wish to stop generation not by "\n" sign, but by another signature, like "User:" (which is also good idea), or any other, make the following modification in the llama/generation.py:
78
-
79
- ![image](https://user-images.githubusercontent.com/22396871/224122767-227deda4-a718-4774-a7f9-786c07d379cf.png)
80
-
81
- -5 means to remove last 5 chars from resulting context, which is length of your stop signature, "User:" in this example.