Edit model card

Vietnamese poem classification and evaluation 📜🔍

A Vietnamese poem classifer using BertForSequenceClassification with the accuracy of 99.7%

This is a side project during the making of our Vietnamese poem generator

Features

  • Classify Vietnamese poem into categories of 4 chu, 5 chu, 7 chu, luc bat and 8 chu
  • Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: score = L/10 + 3T/10 + 6R/10

The rules for each genre are defined below:

Genre Length Tone Rhyme
4 chu - 4 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (trắc), the 4th word is even (bằng)
- Vice versa
Last word (4th) of each line:
- Continuous rhyme (gieo vần tiếp)
- Alternating rhyme (gieo vần tréo)
- Three-line rhyme (gieo vần ba)
5 chu - 5 words per line
- 4 lines per stanza (optional)
Same as "4 chu" Same as "4 chu"
7 chu - 7 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc)
- 5th word and last word (7th) must have different tone
The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme
luc bat - 6 words in odd line
- 8 words in even line
- 4 lines per stanza (optional)
For 6-word line:
- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc)

For 8-word line:
- Must be same as previous 6-word line
- The last word (8th) mut have same tone as 6th word but different accent
The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line
8 chu - 8 words per line
- 4 lines per stanza (optional)
For each line:
- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)
Same as "4 chu"

Data

A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download here

For more detail, refer to the Acknowledgments section

Training

Training code is in our repo Vietnamese poem generator

Run:

python poem_classifier_training.py

Installation

pip install vietnamese-poem-classifier

Or

pip install git+https://github.com/Anshler/vietnamese-poem-classifier

Inference

from vietnamese_poem_classifier.poem_classifier import PoemClassifier

classifier = PoemClassifier()

poem = '''Người đi theo gió đuổi mây
          Tôi buồn nhặt nhạnh tháng ngày lãng quên
          Em theo hú bóng kim tiền
          Bần thần tôi ngẫm triền miên thói đời.'''

classifier.predict(poem)

#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]

Model

The model's weights are published at Huggingface Anshler/vietnamese-poem-classifier

Acknowledgments

This project was inspired by the evaluation method from fsoft-ailab's SP-GPT2 Poem-Generator

Dataset also taken from their repo

Downloads last month
2
Safetensors
Model size
111M params
Tensor type
F32
·