File size: 4,233 Bytes
3233cbd
 
 
 
 
 
 
 
 
c4528be
97720a1
c4528be
673cb24
c4528be
673cb24
c4528be
8577216
c4528be
 
 
 
 
 
 
 
 
8577216
c4528be
 
0af1151
693721e
 
 
 
 
 
c4528be
 
 
 
8577216
c4528be
 
 
 
 
0af1151
 
 
 
 
 
 
 
463502e
c4528be
8577216
c4528be
 
463502e
c4528be
 
 
463502e
 
 
 
 
 
 
 
c4528be
463502e
 
0af1151
463502e
 
c4528be
 
 
 
673cb24
 
 
666941b
c4528be
 
 
3233cbd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: bigscience-openrail-m
language:
- zh
pipeline_tag: text2text-generation
thumbnail: Chinese Lyrics Generation with Masked Sequence-to-Sequence Pretraining.
---

# Chinese Generation with Masked Sequence-to-Sequence Pretraining

This repository demostrates a format-controllable Chinese lyric generator, fine-tuned on [Chinese-Lyric-Corpus](https://github.com/gaussic/Chinese-Lyric-Corpus) using a [MASS](https://arxiv.org/abs/1905.02450)-like strategy.

# Usage

## Initialization

```python
from transformers import MT5ForConditionalGeneration, MT5Tokenizer, Text2TextGenerationPipeline
model_path = "zake7749/chinese-lyrics-generation-mass"
model = MT5ForConditionalGeneration.from_pretrained(model_path)
tokenizer = MT5Tokenizer.from_pretrained(model_path)
pipe = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
```

## Generate lyrics with a template

```python
template = "風花雪月。像XXXXXXXXXX。日升月落。仿若XXXXXXXXXX。"
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text']
print(lyric) # 風花雪月。像你在我的夢裡慢慢散落。日升月落。仿若我宿命無法陪隨你走過。


template = "XXXXXXX留戀。XXXXXXX。XXX燈火XXXX。XXX手牽手XXXX。"
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text']
print(lyric) # 我們說好一生不留戀。我們相約在夏天。我們的燈火相偎相牽。我們說好手牽手到永遠。

```

## Acrostic

```python
template = "分XXXXXX。手XXXXXXXXX。之XXXXXXX。後XXXXXXXXX。"
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text']
print(lyric) # 分開後激情浮現。手牽著手走過的那一天。之間有太多的危險。後悔一點點,傷心一片。
```

## Completion

```python
template = "餘生的光陰牽你手前行。我們共赴一場光年的旅行。XXXXXXXXXX。XXXXXXXXXXXX。"
lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text']
print(lyric) # 餘生的光陰牽你手前行。我們共赴一場光年的旅行。走過的經歷新舊的記憶。都是帶著珍珠淚水無法代替。
```

## Random Generation 

```python
import random

num_example = 5
min_sentence_num, max_sentence_num = 2, 5
min_characher_num, max_character_num = 4, 10

for example_id in range(num_example):
    num_sentences = random.randint(min_sentence_num,  max_sentence_num)
    num_words = ["X" * random.randint(min_characher_num, max_character_num)
                 for _ in range(num_sentences)]

    template = "。".join(num_words) + "。"
    lyric = pipe(template, max_length=128, top_p=0.8, do_sample=True, repetition_penalty=1.2)[0]['generated_text']
    print(f"{example_id + 1}. {lyric}")

# 1. 愛不愛我。讓自己難過。你的擁抱是那麼多。
# 2. 那一天我們重相見。你已站在那個熟悉的街邊。讓我魂牽夢繞在肩。有你的明天。不再留戀。飛過天邊。
# 3. 誰知我們入骨的相思。深深地被俘虜。苦澀滋味含在茶中傾訴。餘情未了落幕。愛到痛處奢望幸福。
# 4. 為什麼你一直讓我傷心。總覺得對你太著迷。
# 5. 一點可憐。還在期待你會出現。哪怕只是匆匆一眼。
```

# Note

1. The model is still under training, so sometimes it might not follow the template explicitly, especially for long sequences generation.
2. The model would output `,` as a pause in the lyric, for example `我的愛,像潮水。`. If you don't need the pause, you can add the id of `,` to `bad_words_ids`.
3. The model was only fine-tuned on traditional Chinese corpus which leads to a bit unstable performance in simplified Chinese.
4. When there are no/few keywords in the given input, the model may **combine snippets from real world songs** to fit the template. 

# Disclaimer

This lyric generator is for academic purposes only. Users of this model should exercise caution and carefully evaluate the results before using them for any commercial or non-academic purpose. We are not liable for any damages or losses resulting from the use or misuse of the model.