Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,89 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language:
|
4 |
+
- vi
|
5 |
+
metrics:
|
6 |
+
- accuracy
|
7 |
+
library_name: transformers
|
8 |
+
tags:
|
9 |
+
- poem
|
10 |
+
- vietnamese
|
11 |
+
- classification
|
12 |
+
- evaluation
|
13 |
---
|
14 |
+
# Vietnamese poem classification and evaluation 📜🔍
|
15 |
+
|
16 |
+
A Vietnamese poem classifer using [BertForSequenceClassification](https://huggingface.co/trituenhantaoio/bert-base-vietnamese-uncased) with the accuracy of ```99.7%```
|
17 |
+
|
18 |
+
This is a side project during the making of our [Vietnamese poem generator](https://github.com/Anshler/poem_generator)
|
19 |
+
|
20 |
+
## Features
|
21 |
+
|
22 |
+
* Classify Vietnamese poem into categories of ```4 chu```, ```5 chu```, ```7 chu```, ```luc bat``` and ```8 chu```
|
23 |
+
* Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: ```score = L/10 + 3T/10 + 6R/10```
|
24 |
+
|
25 |
+
The rules for each genre are defined below:
|
26 |
+
|
27 |
+
| Genre | Length | Tone | Rhyme |
|
28 |
+
|------------------|------------------|--------------|------------------------|
|
29 |
+
| 4 chu | - 4 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng) <br>- Vice versa | Last word (4th) of each line: <br>- Continuous rhyme (gieo vần tiếp) <br>- Alternating rhyme (gieo vần tréo) <br>- Three-line rhyme (gieo vần ba)|
|
30 |
+
| 5 chu | - 5 words per line <br>- 4 lines per stanza (optional) | Same as "4 chu" | Same as "4 chu" |
|
31 |
+
| 7 chu | - 7 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc) <br> - 5th word and last word (7th) must have different tone | The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |
|
32 |
+
| luc bat | - 6 words in odd line <br>- 8 words in even line <br>- 4 lines per stanza (optional) | For 6-word line: <br>- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc) <br><br> For 8-word line: <br>- Must be same as previous 6-word line <br>- The last word (8th) mut have same tone as 6th word | The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |
|
33 |
+
| 8 chu | - 8 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)| Same as "4 chu" |
|
34 |
+
|
35 |
+
|
36 |
+
|
37 |
+
|
38 |
+
## Data
|
39 |
+
|
40 |
+
A colelction of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download [here](https://github.com/fsoft-ailab/Poem-Generator/raw/master/dataset/poems_dataset.zip)
|
41 |
+
|
42 |
+
For more detail, refer to the _Acknowledgments_ section
|
43 |
+
|
44 |
+
## Training
|
45 |
+
|
46 |
+
Training code is in our repo [Vietnamese poem generator](https://github.com/Anshler/poem_generator)
|
47 |
+
|
48 |
+
Run:
|
49 |
+
```
|
50 |
+
python poem_classifier_training.py
|
51 |
+
```
|
52 |
+
|
53 |
+
## Installation
|
54 |
+
|
55 |
+
```
|
56 |
+
pip install vietnamese-poem-classifier
|
57 |
+
```
|
58 |
+
Or
|
59 |
+
|
60 |
+
```
|
61 |
+
pip install git+https://github.com/Anshler/vietnamese-poem-classifier
|
62 |
+
```
|
63 |
+
|
64 |
+
## Inference
|
65 |
+
|
66 |
+
```python
|
67 |
+
from vietnamese_poem_classifier.poem_classifier import PoemClassifier
|
68 |
+
|
69 |
+
classifier = PoemClassifier()
|
70 |
+
|
71 |
+
poem = '''Người đi theo gió đuổi mây
|
72 |
+
Tôi buồn nhặt nhạnh tháng ngày lãng quên
|
73 |
+
Em theo hú bóng kim tiền
|
74 |
+
Bần thần tôi ngẫm triền miên thói đời.'''
|
75 |
+
|
76 |
+
classifier.predict(poem)
|
77 |
+
|
78 |
+
#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75}]
|
79 |
+
```
|
80 |
+
|
81 |
+
## Model
|
82 |
+
|
83 |
+
The model's weights are published at Huggingface [Anshler/vietnamese-poem-classifier](https://huggingface.co/Anshler/vietnamese-poem-classifier)
|
84 |
+
|
85 |
+
## Acknowledgments
|
86 |
+
|
87 |
+
_This project was inspired by the evaluation method from ```fsoft-ailab```_'s_ [SP-GPT2 Poem-Generator](https://github.com/fsoft-ailab/Poem-Generator)
|
88 |
+
|
89 |
+
_Dataset also taken from their repo_
|