File size: 2,632 Bytes
c5f1973
 
fa73cc6
 
 
 
 
 
 
 
 
 
 
 
c5f1973
fa73cc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7958ff8
fa73cc6
 
 
 
 
 
 
 
 
1fa02b8
fa73cc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: llama2
language:
- en
pipeline_tag: text-generation
inference: false
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
- inferentia2
- neuron
---
# Neuronx model for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)

This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf).
You can find detailed information about the base model on its [Model Card](https://huggingface.co/codellama/CodeLlama-7b-hf).

This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.  

It has been compiled to run on an inf2.24xlarge instance on AWS.

Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.

## Usage on Amazon SageMaker

_coming soon_

## Usage with 🤗 `optimum-neuron`

```python
>>> from optimum.neuron import pipeline

>>> p = pipeline('text-generation', 'aws-neuron/CodeLlama-7b-hf-neuron-24xlarge')
>>> p("import socket\n\ndef ping_exponential_backoff(host: str):",
    do_sample=True,
    top_k=10,
    temperature=0.1,
    top_p=0.95,
    num_return_sequences=1,
    max_length=200,
)
[{'generated_text': 'import socket\n\ndef ping_exponential_backoff(host: str):\n    """\n    Ping a host with exponential backoff.\n\n    :param host: Host to ping\n    :return: True if host is reachable, False otherwise\n    """\n    for i in range(1, 10):\n        try:\n            socket.create_connection((host, 80), 1).close()\n            return True\n        except OSError:\n            time.sleep(2 ** i)\n    return False\n\n\ndef ping_exponential_backoff_with_timeout(host: str, timeout: int):\n    """\n    Ping a host with exponential backoff and timeout.\n\n    :param host: Host to ping\n    :param timeout: Timeout in seconds\n    :return: True if host is reachable, False otherwise\n    """\n    for'}]
```
This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.

## Arguments passed during export

**input_shapes**

```json
{
  "batch_size": 1,
  "sequence_length": 2048,
}
```

**compiler_args**

```json
{
  "auto_cast_type": "fp16",
  "num_cores": 12,
}
```