File size: 2,581 Bytes
d31deac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2546d96
7284cf2
d31deac
 
2546d96
 
7284cf2
 
 
d31deac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8b0a64
d31deac
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
language:
- ru
base_model: t-tech/T-pro-it-1.0
tags:
- llama-cpp
---

# T-pro-it-1.0-Q8_0-GGUF 

**🚨 T-pro is designed for further fine-tuning and is not intended as a ready-to-use conversational assistant. Users are advised to exercise caution and are responsible for any additional training and oversight required to ensure the model's responses meet acceptable ethical and safety standards. The responsibility for incorporating this model into industrial or commercial solutions lies entirely with those who choose to deploy it.**

## Description

This repository contains the [`T-pro-it-1.0`](https://huggingface.co/t-tech/T-pro-it-1.0/) model, which has been quantized into the GGUF format using the [`llama.cpp`](https://github.com/ggerganov/llama.cpp) repository.
 

## 📊 Benchmarks

Detailed evaluation results of oringal model can be found in our [habr post](https://habr.com/ru/companies/tbank/articles/865582/). 

| Benchmark                                      | T-pro-it-1.0                         | T-pro-it-1.0-Q4_K_M               | T-pro-it-1.0-Q5_K_M               | T-pro-it-1.0-Q6_K            | T-pro-it-1.0-Q8_0            | 
|------------------------------------------------|--------------------------------------|-----------------------------------|-----------------------------------|------------------------------|------------------------------|
| Arena-Hard-Ru                                  | **90.17** (-1.3, 1.5)                | 89.0 (-1.5, 1.3)                  | 89.29 (-1.6, 1.3)                 | 88.5 (-1.3, 1.3)             | 89.35 (-1.2, 1.2)            | 

## Llama.cpp usage

### Server

From HF:

```bash
llama-server --hf-repo t-tech/T-pro-it-1.0-Q8_0-GGUF --hf-file t-pro-it-1.0-q8_0.gguf -c 8192
```

Or locally:

```bash
./build/bin/llama-server -m t-pro-it-1.0-q8_0.gguf -c 8192
```

### POST

```bash
curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{
        "prompt": "<|im_start|>user\nРасскажи мне чем отличается Python от C++?\n<|im_end|>\n<|im_start|>assistant\n",
        "n_predict": 256
    }'

```


## ollama usage

### Serve

```bash
ollama serve
```

### Run

From HF:

```bash
ollama run hf.co/t-tech/T-pro-it-1.0-Q8_0-GGUF:Q8_0 "Расскажи мне про отличия C++ и Python"
``` 

Or locally:

```bash
ollama create example -f Modelfile
ollama run example "Расскажи мне про отличия C++ и Python"
```

where `Modelfile` is

```bash
FROM ./t-pro-it-1.0-q8_0.gguf
```