Spaces:
Sleeping
Sleeping
File size: 5,998 Bytes
3860419 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
Using with open/local models
============================
**Use `gpte` first with OpenAI models to get a feel for the `gpte` tool.**
**Then go play with experimental Open LLMs π support and try not to get π₯!!**
At the moment the best option for coding is still the use of `gpt-4` models provided by OpenAI. But open models are catching up and are a good free and privacy-oriented alternative if you possess the proper hardware.
You can integrate `gpt-engineer` with open-source models by leveraging an OpenAI-compatible API.
We provide the minimal and cleanest solution below. What is described is not the only way to use open/local models, but the one we tested and would recommend to most users.
More details on why the solution below is recommended in [this blog post](https://zigabrencic.com/blog/2024-02-21).
Setup
-----
For inference engine we recommend for the users to use [llama.cpp](https://github.com/ggerganov/llama.cpp) with its `python` bindings `llama-cpp-python`.
We choose `llama.cpp` because:
- 1.) It supports the largest amount of hardware acceleration backends.
- 2.) It supports the diverse set of open LLMs.
- 3.) Is written in `python` and directly on top of `llama.cpp` inference engine.
- 4.) Supports the `openAI` API and `langchain` interface.
To install `llama-cpp-python` follow the official [installation docs](https://llama-cpp-python.readthedocs.io/en/latest/) and [those docs](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/) for MacOS with Metal support.
If you want to benefit from proper hardware acceleration on your machine make sure to set up the proper compiler flags before installing your package.
- `linux`: `CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"`
- `macos` with Metal support: `CMAKE_ARGS="-DLLAMA_METAL=on"`
- `windows`: `$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"`
This will enable the `pip` installer to compile the `llama.cpp` with the proper hardware acceleration backend.
Then run:
```bash
pip install llama-cpp-python
```
For our use case we also need to set up the web server that `llama-cpp-python` library provides. To install:
```bash
pip install 'llama-cpp-python[server]'
```
For detailed use consult the [`llama-cpp-python` docs](https://llama-cpp-python.readthedocs.io/en/latest/server/).
Before we proceed we need to obtain the model weights in the `gguf` format. That should be a single file on your disk.
In case you have weights in other formats check the `llama-cpp-python` docs for conversion to `gguf` format.
Models in other formats `ggml`, `.safetensors`, etc. won't work without prior conversion to `gguf` file format with the solution described below!
Which open model to use?
==================
Your best choice would be:
- CodeLlama 70B
- Mixtral 8x7B
We are still testing this part, but the larger the model you can run the better. Sure the responses might be slower in terms of (token/s), but code quality will be higher.
For testing that the open LLM `gpte` setup works we recommend starting with a smaller model. You can download weights of [CodeLlama-13B-GGUF by the `TheBloke`](https://huggingface.co/TheBloke/CodeLlama-13B-GGUF) choose the largest model version you can run (for example `Q6_K`), since quantisation will degrade LLM performance.
Feel free to try out larger models on your hardware and see what happens.
Running the Example
==================
To see that your setup works check [test open LLM setup](examples/test_open_llm/README.md).
If above tests work proceed π
For checking that `gpte` works with the `CodeLLama` we recommend for you to create a project with `prompt` file content:
```
Write a python script that sums up two numbers. Provide only the `sum_two_numbers` function and nothing else.
Provide two tests:
assert(sum_two_numbers(100, 10) == 110)
assert(sum_two_numbers(10.1, 10) == 20.1)
```
Now run the LLM in separate terminal:
```bash
python -m llama_cpp.server --model $model_path --n_batch 256 --n_gpu_layers 30
```
Then in another terminal window set the following environment variables:
```bash
export OPENAI_API_BASE="http://localhost:8000/v1"
export OPENAI_API_KEY="sk-xxx"
export MODEL_NAME="CodeLLama"
export LOCAL_MODEL=true
```
And run `gpt-engineer` with the following command:
```bash
gpte <project_dir> $MODEL_NAME --lite --temperature 0.1
```
The `--lite` mode is needed for now since open models for some reason behave worse with too many instructions at the moment. Temperature is set to `0.1` to get consistent best possible results.
That's it.
*If sth. doesn't work as expected, or you figure out how to improve the open LLM support please let us know.*
Using Open Router models
==================
In case you don't posses the hardware to run local LLM's yourself you can use the hosting on [Open Router](https://openrouter.ai) and pay as you go for the tokens.
To set it up you need to Sign In and load purchase π° the LLM credits. Pricing per token is different for (each model](https://openrouter.ai/models), but mostly cheaper then Open AI.
Then create the API key.
To for example use [Meta: Llama 3 8B Instruct (extended)](https://openrouter.ai/models/meta-llama/llama-3-8b-instruct:extended) with `gpte` we need to set:
```bash
export OPENAI_API_BASE="https://openrouter.ai/api/v1"
export OPENAI_API_KEY="sk-key-from-open-router"
export MODEL_NAME="meta-llama/llama-3-8b-instruct:extended"
export LOCAL_MODEL=true
```
```bash
gpte <project_dir> $MODEL_NAME --lite --temperature 0.1
```
Using Azure models
==================
You set your Azure OpenAI key:
- `export OPENAI_API_KEY=[your api key]`
Then you call `gpt-engineer` with your service endpoint `--azure https://aoi-resource-name.openai.azure.com` and set your deployment name (which you created in the Azure AI Studio) as the model name (last `gpt-engineer` argument).
Example:
`gpt-engineer --azure https://myairesource.openai.azure.com ./projects/example/ my-gpt4-project-name`
|