File size: 1,276 Bytes
9bde41a 5914f14 31e5279 085872d 31e5279 085872d 31e5279 085872d 31e5279 1f6103e 31e5279 34b02ae 0439f95 34b02ae 31e5279 34b02ae 31b23b6 31e5279 4108812 f42bfee 4108812 34b02ae 4108812 34b02ae 4108812 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
---
license: mit
language:
- ru
metrics:
- accuracy
pipeline_tag: text-classification
widget:
- text: "Взрыв газа произошел в 2-этажном доме в поселке под Казанью, пострадали четыре человека, сообщает МЧС"
example_title: "Новость"
- text: "Сын поздравил меня с днём рождения стихами ❤️"
example_title: "Не новость"
---
## Model Details
### Model Description
News_classifier is a fine-tuned model designed for binary classifying (news/not news) from various Russian-language Telegram channels. This model can be integrated into a news aggregation service.
- **Model type:** Sentence RuBERT (Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters)
- **Language(s):** russian (ru)
- **License:** mit
- **Finetuned from model:** `DeepPavlov/rubert-base-cased-sentence`
## Dataset
- Russian telegram posts
- train/valid/test: 2970/165/165
## Training Details
- token max length: 512
- num labels: 2
- batch size: 16
- learning rate: 2e-5
- train epochs: 20
- weight decay: 0.01
## Metrics:
- Matthews_correlation (training evaluation metric): 0.89
- Accuracy: 0.95
## Label Scheme
- LABEL_1 - news
- LABEL_0 - not news |