Uploaded model

Developed by: ruslandev
License: apache-2.0
Finetuned from model : unsloth/llama-3-70b-bnb-4bit

This model is finetuned on the Tagengo dataset. Please note - this model has been created for educational purposes and it needs further training/fine tuning.

How to use

The easiest way to use this model on your own computer is to use the GGUF version of this model (ruslandev/llama-3-70b-tagengo-GGUF) using a program such as llama.cpp. If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework gptchain.

git clone https://github.com/RuslanPeresy/gptchain.git
cd gptchain
pip install -r requirements-train.txt
python gptchain.py chat -m ruslandev/llama-3-70b-tagengo \
    --chatml true \
    -q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]'

Training

gptchain framework has been used for training.

python gptchain.py train -m unsloth/llama-3-70b-bnb-4bit \
    -dn tagengo_gpt4 \
    -sp checkpoints/llama-3-70b-tagengo \
    -hf llama-3-70b-tagengo \
    --max-steps 2400

Training hyperparameters

learning_rate: 2e-4
seed: 3407
gradient_accumulation_steps: 4
per_device_train_batch_size: 2
optimizer: adamw_8bit
lr_scheduler_type: linear
warmup_steps: 5
max_steps: 2400
weight_decay: 0.01

Training results

wandb report

2400 steps took 7 hours on a single H100

ruslandev
/

llama-3-70b-tagengo

Uploaded model

How to use

Training

Training hyperparameters

Training results

Model tree for ruslandev/llama-3-70b-tagengo

Dataset used to train ruslandev/llama-3-70b-tagengo