Model Card for Solade

Model Details

Model Description

Языковая модель на ????? параметров, архитектура в стиле LLaMA (RMSNorm, RoPE, SwiGLU, Flash Attention, weight tying). Обучена с нуля на собственном BPE-токенизаторе.

  • Developed by: DLMveloper
  • Model type: Decoder-only transformer (text generation)
  • Language(s): Russian, English, Kazakh
  • License: [The license is being examined by lawyers]

Model Sources

Uses

Direct Use

Генерация текста на русском, английском, казахском языках.

Out-of-Scope Use

Модель обучена на ограниченном объёме данных (300 шагов), не предназначена для высокоточных или критичных задач.

Bias, Risks, and Limitations

Модель обучена на небольшом количестве шагов и может выдавать несвязный или некорректный текст.

How to Get Started with the Model

Training Details

Training Data

Датасет: DLMveloper/DLM_DataSet (подвыборка ~20000 примеров)

Training Procedure

Training Hyperparameters

  • Training regime: ???????
  • Steps: ???
  • Batch size: ?
  • Learning rate: ?????
  • Sequence length: ???.

Speeds, Sizes, Times

  • Размер модели: ????? (??-bit quantized)

Technical Specifications

Model Architecture and Objective

  • Параметров: ??
  • Слоёв: ??
  • Hidden size: ????
  • Attention heads: ??
  • Intermediate size (FFN): ????
  • Vocab size: ???
  • Компоненты: ???????

Compute Infrastructure

Software

???????????

Downloads last month
207
Safetensors
Model size
0.6B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train DLMveloper/Solade_Broken