Added logistic regression language classifier model
Browse files- README.md +2 -38
- config.json +6 -0
- model/language_classifier.joblib +3 -0
- model_card.md +11 -0
README.md
CHANGED
@@ -1,40 +1,4 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
language:
|
4 |
-
- ru
|
5 |
-
pipeline_tag: text-classification
|
6 |
-
tags:
|
7 |
-
- tuvan
|
8 |
-
- russian
|
9 |
-
- binary classifier
|
10 |
-
---
|
11 |
-
# GitHub
|
12 |
-
|
13 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
14 |
-
|
15 |
-
TuRu - Tuvan/Russian binary classifier model [GitHub](https://github.com/tarbagan/tuvalang/tree/main/turu).
|
16 |
-
|
17 |
-
|
18 |
-
## How to use
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
```python
|
23 |
-
from tensorflow.keras.models import load_model
|
24 |
-
|
25 |
-
model = load_model('turu.h5')
|
26 |
-
|
27 |
-
text_to_predict = ["""
|
28 |
-
Президент ооң бодалы-биле алырга, регионалдыг-даа, муниципалдыг-даа деңнелде деткиир ужурлуг регионнарда спортчу инфраструктура хөгжүлдезиниң айтырыын көрген.
|
29 |
-
Ооң келир үеде президент программазының угланыышкыны ол апаарын Владимир Путин чугаалаан.
|
30 |
-
"""]
|
31 |
-
|
32 |
-
sequences = tokenizer.texts_to_sequences(text_to_predict)
|
33 |
-
padded = pad_sequences(sequences, maxlen=10)
|
34 |
-
|
35 |
-
prediction = model.predict(padded)
|
36 |
-
print(prediction)
|
37 |
-
|
38 |
-
```
|
39 |
|
|
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
2 |
+
# Language Classifier
|
3 |
|
4 |
+
This model is trained to classify text as either Russian or Tuvan language.
|
config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
{
|
3 |
+
"model_type": "logistic_regression",
|
4 |
+
"language": ["russian", "tuvan"],
|
5 |
+
"pipeline_tag": "text-classification"
|
6 |
+
}
|
model/language_classifier.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:93552bc0072004f7cceece81b1ffd546743d530c55d00fe1dd7703e5a35b87b6
|
3 |
+
size 14610753
|
model_card.md
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- language-classification
|
5 |
+
- russian
|
6 |
+
- tuvan
|
7 |
+
---
|
8 |
+
|
9 |
+
# Language Classifier
|
10 |
+
|
11 |
+
This model is trained to classify text as either Russian or Tuvan language. It is based on a logistic regression classifier.
|