ruRoberta-distilled

Model was distilled from ai-forever/ruRoberta-large with ❤️ by me.

Usage

from transformers import pipeline


pipe = pipeline('feature-extraction', model='d0rj/ruRoberta-distilled')
tokens_embeddings = pipe('Привет, мир!')
import torch
from transformers import AutoTokenizer, AutoModel


tokenizer = AutoTokenizer.from_pretrained('d0rj/ruRoberta-distilled')
model = AutoModel.from_pretrained('d0rj/ruRoberta-distilled')


def embed_bert_cls(text: str) -> torch.Tensor:
    t = tokenizer(text, padding=True, truncation=True, return_tensors='pt').to(model.device)
    with torch.no_grad():
        model_output = model(**t)
    embeddings = model_output.last_hidden_state[:, 0, :]
    embeddings = torch.nn.functional.normalize(embeddings)
    return embeddings[0].cpu()


embedding = embed_bert_cls('Привет, мир!')

Logs

Distillation process lasts for 120 hours on 4 Nvidia V100.

See all logs at WandB.

Configuration changes

  • Activation GELU -> GELUFast
  • Attention heads 16 -> 8
  • Hidden layers 24 -> 6
  • Weights size 1.42 GB -> 464 MB

Data

Overall: 9.4 GB of raw texts, 5.1 GB of binarized texts.

Only texts in Russian were used for distillation. I do not know how the model behaves in Englishю

Used data:

Downloads last month
21
Safetensors
Model size
116M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train d0rj/ruRoberta-distilled