File size: 2,404 Bytes
7012a36
 
 
 
 
 
 
 
 
98cb9ee
7012a36
 
 
a5d9b5a
fea7cd8
98cb9ee
fea7cd8
7012a36
 
 
 
 
 
 
4242c49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b3a5e5
4242c49
 
 
2d38b88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4242c49
084dae6
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
language:
- en
pipeline_tag: text-generation
tags:
- meta
- pytorch
- llama
- llama-3
- llama-cpp
- quantized
- 8-bit
- GGUF
- 8 Billion
- python
- instruct
- google-colab
inference: false
model_creator: sourabhdattawad
model_name: meta-llama-3-8B-instruct-gguf
quantized_by: sourabhdattawad
license_name: llama3
---

## Usage

Package installation

```
pip install llama-cpp-python "huggingface_hub[cli]"
```

Download the model:

```
huggingface-cli download sourabhdattawad/meta-llama-3-8b-instruct-gguf meta-llama-3-8b-instruct.Q8_0.gguf --local-dir . --local-dir-use-symlinks False
```

```Python
from llama_cpp import Llama
llm = Llama(
      model_path="meta-llama-3-8b-instruct.Q8_0.gguf",
      # n_gpu_layers=-1, # Uncomment to use GPU acceleration
      # seed=1337, # Uncomment to set a specific seed
      # n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=50, # Generate up to 50 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
)
output
```
```
Llama.generate: prefix-match hit

llama_print_timings:        load time =    7770.49 ms
llama_print_timings:      sample time =     100.16 ms /    40 runs   (    2.50 ms per token,   399.35 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   35214.73 ms /    40 runs   (  880.37 ms per token,     1.14 tokens per second)
llama_print_timings:       total time =   35895.91 ms /    41 tokens
{'id': 'cmpl-01e2feb3-c0ff-4a6e-8ca4-b8bf2172da01',
 'object': 'text_completion',
 'created': 1713912080,
 'model': 'meta-llama-3-8b-instruct.Q8_0.gguf',
 'choices': [{'text': 'Q: Name the planets in the solar system? A: 1. Mercury, 2. Venus, 3. Earth, 4. Mars, 5. Jupiter, 6. Saturn, 7. Uranus, 8. Neptune.',
   'index': 0,
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 13, 'completion_tokens': 40, 'total_tokens': 53}}
```

## Google Colab

[https://colab.research.google.com/drive/1vhrCKGzY7KP5mScHNUl7hjmbPsUyj_sj?usp=sharing)](https://colab.research.google.com/drive/1vhrCKGzY7KP5mScHNUl7hjmbPsUyj_sj?usp=sharing)