File size: 2,034 Bytes
2742411
 
 
 
 
 
 
 
 
 
 
fe7d4c0
 
2742411
 
 
 
 
 
 
 
fe7d4c0
 
2742411
fe7d4c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ad6a4b
fe7d4c0
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: unsloth/llama-3-70b-bnb-4bit
datasets:
- lightblue/tagengo-gpt4
---

# Uploaded  model

- **Developed by:** ruslandev
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-70b-bnb-4bit

This model is finetuned on the Tagengo dataset.
Please note - this model has been created for educational purposes and it needs further training/fine tuning.

# How to use

The easiest way to use this model on your own computer is to use the GGUF version of this model ([ruslandev/llama-3-70b-tagengo-GGUF](https://huggingface.co/ruslandev/llama-3-70b-tagengo-GGUF)) using a program such as [llama.cpp](https://github.com/ggerganov/llama.cpp).
If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework [gptchain](https://github.com/RuslanPeresy/gptchain).

```
git clone https://github.com/RuslanPeresy/gptchain.git
cd gptchain
pip install -r requirements-train.txt
python gptchain.py chat -m ruslandev/llama-3-70b-tagengo \
	--chatml true \
	-q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]'
```

# Training
[gptchain](https://github.com/RuslanPeresy/gptchain) framework has been used for training.

```
python gptchain.py train -m unsloth/llama-3-70b-bnb-4bit \
    -dn tagengo_gpt4 \
    -sp checkpoints/llama-3-70b-tagengo \
    -hf llama-3-70b-tagengo \
    --max-steps 2400
```

# Training hyperparameters

- learning_rate: 2e-4
- seed: 3407
- gradient_accumulation_steps: 4
- per_device_train_batch_size: 2
- optimizer: adamw_8bit
- lr_scheduler_type: linear
- warmup_steps: 5
- max_steps: 2400
- weight_decay: 0.01

# Training results
[wandb report](https://api.wandb.ai/links/ruslandev/rilj60ra)

2400 steps took 7 hours on a single H100

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)