|
--- |
|
language: vi |
|
tags: |
|
- vi |
|
- vietnamese |
|
- gpt2 |
|
- text-generation |
|
- lm |
|
- nlp |
|
datasets: |
|
- VN-Literature |
|
widget: |
|
- text: >- |
|
Hôm ấy, cụ Bá ông quả quyết mở ví tiền để trả cho anh lái chó cái giấy bạc |
|
một đồng. |
|
--- |
|
inference: |
|
parameters: |
|
max_length: 500 |
|
do_sample: True |
|
temperature: 0.8 |
|
|
|
# GPT-2 |
|
|
|
The GPT2 model is pre-trained on the writing style of Vu Trong Phung |
|
|
|
# How to use the model |
|
|
|
~~~~ |
|
from transformers import GPT2Tokenizer, GPT2LMHeadModel |
|
|
|
tokenizer = GPT2Tokenizer.from_pretrained("Khoa/VN-Literature-Generation") |
|
model = GPT2LMHeadModel.from_pretrained("Khoa/VN-Literature-Generation") |
|
|
|
|
|
text = "Mùa thu lá vàng rơi" |
|
input_ids = tokenizer.encode(text, return_tensors='pt') |
|
max_length = 300 |
|
model.to('cpu') |
|
sample_outputs = model.generate(input_ids,pad_token_id=tokenizer.eos_token_id, |
|
do_sample=True, |
|
max_length=max_length, |
|
min_length=max_length, |
|
top_k=40, |
|
num_beams=5, |
|
early_stopping=True, |
|
no_repeat_ngram_size=2, |
|
num_return_sequences=3) |
|
|
|
for i, sample_output in enumerate(sample_outputs): |
|
print(">> Generated text {}\n\n{}".format(i+1, tokenizer.decode(sample_output.tolist()))) |
|
print('\n---') |
|
|
|
~~~~ |
|
|
|
|
|
## Author |
|
` |
|
Dong Dang Khoa |
|
` |