ruGPT-3.5-13B / chain of thought
LoRA адаптер для ruGPT3.5-13B обученный на датасете evilfreelancer/ru-chain-of-thought-sharegpt данный датасет представляет из себя перевод на русский датасета isaiahbjork/chain-of-thought-sharegpt при помощи модели utrobinmv/t5_translate_en_ru_zh_small_1024 прикладываю скрипт перевода на Gist.
Конфигурация: https://github.com/EvilFreelancer/impruver/blob/main/configs/ruGPT35_13B_cot_lora.yml
Адаптер обучался на 1x RTX 4090, для этого потребовалось примерно 20Gb VRAM и заняло 19m.
output_dir: ./models/ruGPT35_13B_lora_cot
train_path: ./train.ruGPT35_13B_cot.jsonl
val_path: ./val.ruGPT35_13B_cot.jsonl
datasets:
- name: evilfreelancer/ru-chain-of-thought-sharegpt
converter: impruver.conversations_to_messages
model:
class: transformers.AutoModelForCausalLM
name: ai-forever/ruGPT-3.5-13B
load_in_4bit: true
load_in_8bit: false
dtype: bf16
lora:
r: 16
lora_alpha: 16
lora_dropout: 0.05
bias: none
target_modules: [ c_attn ]
task_type: CAUSAL_LM
tokenizer:
class: transformers.AutoTokenizer
name: ai-forever/ruGPT-3.5-13B
max_tokens_count: 1200
trainer:
eval_strategy: steps
save_strategy: steps
eval_steps: 100
save_steps: 100
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
gradient_accumulation_steps: 5
logging_steps: 1
learning_rate: 0.0002
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_steps: 16
optim: adamw_8bit
metric_for_best_model: eval_loss
load_best_model_at_end: true
save_total_limit: 2
seed: 42
remove_unused_columns: false
max_grad_norm: 1.0
weight_decay: 0.08
torch_compile: false
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for evilfreelancer/ruGPT3.5-13B-lora-chain-of-thought
Base model
ai-forever/ruGPT-3.5-13B