|
--- |
|
license: bigscience-openrail-m |
|
language: |
|
- zh |
|
pipeline_tag: text2text-generation |
|
thumbnail: Chinese Lyrics Generation with Masked Sequence-to-Sequence Pretraining. |
|
--- |
|
|
|
# Chinese Generation with Masked Sequence-to-Sequence Pretraining |
|
|
|
This repository demostrates a format-controllable Chinese lyric generator, fine-tuned on [Chinese-Lyric-Corpus](https://github.com/gaussic/Chinese-Lyric-Corpus) using a [MASS](https://arxiv.org/abs/1905.02450)-like strategy. |
|
|
|
# Usage |
|
|
|
## Initialization |
|
|
|
```python |
|
from transformers import MT5ForConditionalGeneration, MT5Tokenizer, Text2TextGenerationPipeline |
|
model_path = "zake7749/chinese-lyrics-generation-mass" |
|
model = MT5ForConditionalGeneration.from_pretrained(model_path) |
|
tokenizer = MT5Tokenizer.from_pretrained(model_path) |
|
pipe = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer) |
|
``` |
|
|
|
## Generate lyrics with a template |
|
|
|
```python |
|
template = "風花雪月。像XXXXXXXXXX。日升月落。仿若XXXXXXXXXX。" |
|
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text'] |
|
print(lyric) # 風花雪月。像你在我的夢裡慢慢散落。日升月落。仿若我宿命無法陪隨你走過。 |
|
|
|
|
|
template = "XXXXXXX留戀。XXXXXXX。XXX燈火XXXX。XXX手牽手XXXX。" |
|
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text'] |
|
print(lyric) # 我們說好一生不留戀。我們相約在夏天。我們的燈火相偎相牽。我們說好手牽手到永遠。 |
|
|
|
``` |
|
|
|
## Acrostic |
|
|
|
```python |
|
template = "分XXXXXX。手XXXXXXXXX。之XXXXXXX。後XXXXXXXXX。" |
|
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text'] |
|
print(lyric) # 分開後激情浮現。手牽著手走過的那一天。之間有太多的危險。後悔一點點,傷心一片。 |
|
``` |
|
|
|
## Completion |
|
|
|
```python |
|
template = "餘生的光陰牽你手前行。我們共赴一場光年的旅行。XXXXXXXXXX。XXXXXXXXXXXX。" |
|
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text'] |
|
print(lyric) # 餘生的光陰牽你手前行。我們共赴一場光年的旅行。走過的經歷新舊的記憶。都是帶著珍珠淚水無法代替。 |
|
``` |
|
|
|
## Random Generation |
|
|
|
```python |
|
import random |
|
|
|
num_example = 5 |
|
min_sentence_num, max_sentence_num = 2, 5 |
|
min_characher_num, max_character_num = 4, 10 |
|
|
|
for example_id in range(num_example): |
|
num_sentences = random.randint(min_sentence_num, max_sentence_num) |
|
num_words = ["X" * random.randint(min_characher_num, max_character_num) |
|
for _ in range(num_sentences)] |
|
|
|
template = "。".join(num_words) + "。" |
|
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text'] |
|
print(f"{example_id + 1}. {lyric}") |
|
|
|
# 1. 愛不愛我。讓自己難過。你的擁抱是那麼多。 |
|
# 2. 那一天我們重相見。你已站在那個熟悉的街邊。讓我魂牽夢繞在肩。有你的明天。不再留戀。飛過天邊。 |
|
# 3. 誰知我們入骨的相思。深深地被俘虜。苦澀滋味含在茶中傾訴。餘情未了落幕。愛到痛處奢望幸福。 |
|
# 4. 為什麼你一直讓我傷心。總覺得對你太著迷。 |
|
# 5. 一點可憐。還在期待你會出現。哪怕只是匆匆一眼。 |
|
``` |
|
|
|
# Note |
|
|
|
1. The model is still under training, so sometimes it might not follow the template explicitly, especially for long sequences generation. |
|
2. The model would output `,` as a pause in the lyric, for example `我的愛,像潮水。`. If you don't need the pause, you can add the id of `,` to `bad_words_ids`. |
|
3. The model was only fine-tuned on traditional Chinese corpus which leads to a bit unstable performance in simplified Chinese. |
|
4. When there are no/few keywords in the given input, the model may **combine snippets from real world songs** to fit the template. |
|
|
|
# Disclaimer |
|
|
|
This lyric generator is for academic purposes only. Users of this model should exercise caution and carefully evaluate the results before using them for any commercial or non-academic purpose. We are not liable for any damages or losses resulting from the use or misuse of the model. |