Anshler commited on
Commit
0cda9dc
1 Parent(s): 5296eb4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md CHANGED
@@ -1,3 +1,89 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - vi
5
+ metrics:
6
+ - accuracy
7
+ library_name: transformers
8
+ tags:
9
+ - poem
10
+ - vietnamese
11
+ - classification
12
+ - evaluation
13
  ---
14
+ # Vietnamese poem classification and evaluation 📜🔍
15
+
16
+ A Vietnamese poem classifer using [BertForSequenceClassification](https://huggingface.co/trituenhantaoio/bert-base-vietnamese-uncased) with the accuracy of ```99.7%```
17
+
18
+ This is a side project during the making of our [Vietnamese poem generator](https://github.com/Anshler/poem_generator)
19
+
20
+ ## Features
21
+
22
+ * Classify Vietnamese poem into categories of ```4 chu```, ```5 chu```, ```7 chu```, ```luc bat``` and ```8 chu```
23
+ * Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: ```score = L/10 + 3T/10 + 6R/10```
24
+
25
+ The rules for each genre are defined below:
26
+
27
+ | Genre | Length | Tone | Rhyme |
28
+ |------------------|------------------|--------------|------------------------|
29
+ | 4 chu | - 4 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng) <br>- Vice versa | Last word (4th) of each line: <br>- Continuous rhyme (gieo vần tiếp) <br>- Alternating rhyme (gieo vần tréo) <br>- Three-line rhyme (gieo vần ba)|
30
+ | 5 chu | - 5 words per line <br>- 4 lines per stanza (optional) | Same as "4 chu" | Same as "4 chu" |
31
+ | 7 chu | - 7 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc) <br> - 5th word and last word (7th) must have different tone | The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |
32
+ | luc bat | - 6 words in odd line <br>- 8 words in even line <br>- 4 lines per stanza (optional) | For 6-word line: <br>- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc) <br><br> For 8-word line: <br>- Must be same as previous 6-word line <br>- The last word (8th) mut have same tone as 6th word | The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |
33
+ | 8 chu | - 8 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)| Same as "4 chu" |
34
+
35
+
36
+
37
+
38
+ ## Data
39
+
40
+ A colelction of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download [here](https://github.com/fsoft-ailab/Poem-Generator/raw/master/dataset/poems_dataset.zip)
41
+
42
+ For more detail, refer to the _Acknowledgments_ section
43
+
44
+ ## Training
45
+
46
+ Training code is in our repo [Vietnamese poem generator](https://github.com/Anshler/poem_generator)
47
+
48
+ Run:
49
+ ```
50
+ python poem_classifier_training.py
51
+ ```
52
+
53
+ ## Installation
54
+
55
+ ```
56
+ pip install vietnamese-poem-classifier
57
+ ```
58
+ Or
59
+
60
+ ```
61
+ pip install git+https://github.com/Anshler/vietnamese-poem-classifier
62
+ ```
63
+
64
+ ## Inference
65
+
66
+ ```python
67
+ from vietnamese_poem_classifier.poem_classifier import PoemClassifier
68
+
69
+ classifier = PoemClassifier()
70
+
71
+ poem = '''Người đi theo gió đuổi mây
72
+ Tôi buồn nhặt nhạnh tháng ngày lãng quên
73
+ Em theo hú bóng kim tiền
74
+ Bần thần tôi ngẫm triền miên thói đời.'''
75
+
76
+ classifier.predict(poem)
77
+
78
+ #>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75}]
79
+ ```
80
+
81
+ ## Model
82
+
83
+ The model's weights are published at Huggingface [Anshler/vietnamese-poem-classifier](https://huggingface.co/Anshler/vietnamese-poem-classifier)
84
+
85
+ ## Acknowledgments
86
+
87
+ _This project was inspired by the evaluation method from ```fsoft-ailab```_'s_ [SP-GPT2 Poem-Generator](https://github.com/fsoft-ailab/Poem-Generator)
88
+
89
+ _Dataset also taken from their repo_