Update README.md
Browse files
README.md
CHANGED
@@ -3,3 +3,65 @@ license: other
|
|
3 |
license_name: gemma-terms-of-use
|
4 |
license_link: https://ai.google.dev/gemma/terms
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
license_name: gemma-terms-of-use
|
4 |
license_link: https://ai.google.dev/gemma/terms
|
5 |
---
|
6 |
+
|
7 |
+
## The world's first Gemma fine-tune based on openchat-3.5-0106 data and method (C-RLFT). Almost the same performance as Mistral-based openchat, and much better than Gemma-7b and Gemma-7b-it.
|
8 |
+
|
9 |
+
Please refer to [openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) for details.
|
10 |
+
|
11 |
+
> P.S.: 6T pre-training tokens + C-RLFT is the secret sauce?
|
12 |
+
>
|
13 |
+
> P.P.S.: @Google team, we know your model is great, but please use an OSI-approved license like Mistral (or even Phi and Orca).
|
14 |
+
|
15 |
+
## Benchmarks
|
16 |
+
|
17 |
+
| Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
|
18 |
+
|-----------------------------|----------|----------|----------|-----------|----------|----------|------------|----------|----------|----------|
|
19 |
+
| **OpenChat-3.5-0106 Gemma** | **7B** | 64.4 | 7.83 | 67.7 | **52.7** | **50.2** | 55.4 | 65.7 | **81.5** | 63.7 |
|
20 |
+
| OpenChat-3.5-0106 Mistral | **7B** | **64.5** | 7.8 | **71.3** | 51.5 | 49.1 | **61.0** | 65.8 | 77.4 | 62.2 |
|
21 |
+
| ChatGPT (March) | ???B | 61.5 | **7.94** | 48.1 | 47.6 | 47.1 | 57.7 | **67.3** | 74.9 | **70.1** |
|
22 |
+
| | | | | | | | | | | |
|
23 |
+
| Gemma-7B | 7B | - | - | 32.3 | - | 41.7 | - | 64.3 | 46.4 | - |
|
24 |
+
| Gemma-7B-it * | 7B | 25.4 | - | 28.0 | 38.4 | 32.5 | 34.1 | 26.5 | 10.8 | 7.6 |
|
25 |
+
| OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 |
|
26 |
+
|
27 |
+
*: `Gemma-7b-it` failed to understand and follow most few-shot templates.
|
28 |
+
|
29 |
+
## Usage
|
30 |
+
|
31 |
+
To use this model, we highly recommend installing the OpenChat package by following the [installation guide](https://github.com/imoneoi/openchat#installation) in our repository and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using [vLLM](https://github.com/vllm-project/vllm) and can run on a consumer GPU with 24GB RAM. To enable tensor parallelism, append `--tensor-parallel-size N` to the serving command.
|
32 |
+
|
33 |
+
Once started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat). Please refer to the example request below for reference. Additionally, you can use the [OpenChat Web UI](https://github.com/imoneoi/openchat#web-ui) for a user-friendly experience.
|
34 |
+
|
35 |
+
If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
|
36 |
+
|
37 |
+
| Model | Size | Context | Weights | Serving |
|
38 |
+
|-------------------------|------|---------|------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
|
39 |
+
| OpenChat-3.5-0106-Gemma | 7B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat-3.5-0106-gemma) | `python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106-gemma --engine-use-ray --worker-use-ray` |
|
40 |
+
|
41 |
+
<details>
|
42 |
+
<summary>Example request (click to expand)</summary>
|
43 |
+
|
44 |
+
```bash
|
45 |
+
curl http://localhost:18888/v1/chat/completions \
|
46 |
+
-H "Content-Type: application/json" \
|
47 |
+
-d '{
|
48 |
+
"model": "openchat_3.5_gemma_new",
|
49 |
+
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
50 |
+
}'
|
51 |
+
```
|
52 |
+
|
53 |
+
</details>
|
54 |
+
|
55 |
+
## Conversation template
|
56 |
+
|
57 |
+
⚠️ **Notice:** This is different from the Mistral version. End-of-turn token is `<end_of_turn>` now (Mistral version is `<|end_of_turn|>`). Remember to set `<end_of_turn>` as end of generation token.
|
58 |
+
|
59 |
+
```
|
60 |
+
GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant:
|
61 |
+
```
|
62 |
+
|
63 |
+
With system message (**NOT** recommended, may degrade performance)
|
64 |
+
|
65 |
+
```
|
66 |
+
You are a helpful assistant.<end_of_turn>GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant:
|
67 |
+
```
|