--- license: apache-2.0 language: - ru tags: - distill - fill-mask - embeddings - masked-lm - tiny - sentence-similarity datasets: - GEM/wiki_lingua - xnli - RussianNLP/wikiomnia - mlsum - IlyaGusev/gazeta widget: - text: Москва - России. - text: Если б море было пивом, я бы - text: Столица России - . library_name: transformers pipeline_tag: fill-mask --- # ruRoberta-distilled Model was distilled from [ai-forever/ruRoberta-large](https://huggingface.co/ai-forever/ruRoberta-large) with ❤️ by me. ## Usage ```python from transformers import pipeline pipe = pipeline('feature-extraction', model='d0rj/ruRoberta-distilled') tokens_embeddings = pipe('Привет, мир!') ``` ```python import torch from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained('d0rj/ruRoberta-distilled') model = AutoModel.from_pretrained('d0rj/ruRoberta-distilled') def embed_bert_cls(text: str) -> torch.Tensor: t = tokenizer(text, padding=True, truncation=True, return_tensors='pt').to(model.device) with torch.no_grad(): model_output = model(**t) embeddings = model_output.last_hidden_state[:, 0, :] embeddings = torch.nn.functional.normalize(embeddings) return embeddings[0].cpu() embedding = embed_bert_cls('Привет, мир!') ``` ## Logs Distillation process lasts for 120 hours on 4 Nvidia V100. See all logs at [WandB](https://wandb.ai/d0rj/distill-ruroberta/runs/lehtr3bk/workspace). ## Configuration changes - Activation GELU -> GELUFast - Attention heads 16 -> 8 - Hidden layers 24 -> 6 - Weights size 1.42 GB -> 464 MB ## Data Overall: 9.4 GB of raw texts, 5.1 GB of binarized texts. Only texts in Russian were used for distillation. I do not know how the model behaves in Englishю Used data: - [GEM/wiki_lingua](https://huggingface.co/datasets/GEM/wiki_lingua) - [xnli](https://huggingface.co/datasets/xnli) - [RussianNLP/wikiomnia](https://huggingface.co/datasets/RussianNLP/wikiomnia) - [mlsum](https://huggingface.co/datasets/mlsum) - [IlyaGusev/gazeta](https://huggingface.co/datasets/IlyaGusev/gazeta)