File size: 1,276 Bytes
9bde41a
 
5914f14
 
 
 
 
31e5279
 
 
085872d
31e5279
085872d
31e5279
 
085872d
31e5279
 
 
 
1f6103e
31e5279
34b02ae
0439f95
34b02ae
 
31e5279
34b02ae
31b23b6
 
31e5279
 
4108812
 
 
f42bfee
 
4108812
34b02ae
 
4108812
 
34b02ae
 
4108812
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: mit
language:
- ru
metrics:
- accuracy
pipeline_tag: text-classification

widget:
- text: "Взрыв газа произошел в 2-этажном доме в поселке под Казанью, пострадали четыре человека, сообщает МЧС"
  example_title: "Новость"
- text: "Сын поздравил меня с днём рождения стихами ❤️"
  example_title: "Не новость"
---


## Model Details

### Model Description

News_classifier is a fine-tuned model designed for binary classifying (news/not news) from various Russian-language Telegram channels. This model can be integrated into a news aggregation service.

- **Model type:** Sentence RuBERT (Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters)
- **Language(s):** russian (ru)
- **License:** mit
- **Finetuned from model:** `DeepPavlov/rubert-base-cased-sentence`

## Dataset
- Russian telegram posts
- train/valid/test: 2970/165/165

## Training Details
- token max length: 512
- num labels: 2 
- batch size: 16
- learning rate: 2e-5
- train epochs: 20
- weight decay: 0.01

## Metrics:
- Matthews_correlation (training evaluation metric): 0.89
- Accuracy: 0.95

## Label Scheme
- LABEL_1 - news
- LABEL_0 - not news