deepvk
/

roberta-base

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

roberta-base / README.md

zemerov's picture

Create README.md

5a19b1f about 1 year ago

|

No virus

2.38 kB

	---
	license: apache-2.0
	language:
	- ru
	- en
	library_name: transformers
	---

	# RoBERTa-base from deepvk

	<!-- Provide a quick summary of what the model is/does. -->

	Pretrained bidirectional encoder for russian language.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	Model was pretrained using standard MLM objective on a large text corpora including open social data, books, Wikipedia, webpages etc.


	- Developed by: VK Applied Research Team
	- Model type: RoBERTa
	- Languages: Mostly russian and small fraction of other languages
	- License: Apache 2.0

	## How to Get Started with the Model

	```
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
	model = AutoModel.from_pretrained("deepvk/roberta-base")

	text = "Привет, мир!"

	inputs = tokenizer(text, return_tensors='pt')
	predictions = model(**inputs)
	```


	## Training Details

	### Training Data

	<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	Mix of the following data:


	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing [optional]

	[More Information Needed]


	#### Training Hyperparameters

	- Training regime: [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

	#### Speeds, Sizes, Times [optional]

	Standard RoBERTA-base size;

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Data Card if possible. -->

	[More Information Needed]

	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

	[More Information Needed]

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]

	#### Summary


	## Compute Infrastructure

	Model was trained using 8xA100 for ~22 days.