File size: 1,439 Bytes
ebbd1f1
167c004
 
 
 
 
ebbd1f1
167c004
 
 
 
 
 
5b44bbd
167c004
 
 
ba2084d
167c004
 
 
ba2084d
167c004
 
 
212821a
 
 
 
ba2084d
 
 
83ae482
ba2084d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a49f82b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
tags:
- generated_from_keras_callback
model-index:
- name: tmpq0jhm_jh
  results: []
---

<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->

## Model description

This is a gpt2 model trained on 142 612 different Lithuanian Wikipedia articles + 11 405 articles taken from delfi.lt, ve.lt and respublika.lt news portals.

## Intended uses & limitations

This is a model I trained when writing my bachelors. You can use it anywhere you want.

### Training results

Model reached 36.83% accuracy with training data and 37.02% with validation data

### Framework versions

 Transformers 3.5.0
 TensorFlow 2.4.1
 Tokenizers 0.12.1
 Torch 1.4.0

How to use it:

```python
import tensorflow as tf
from transformers import WEIGHTS_NAME, CONFIG_NAME
from transformers import GPT2Config, TFGPT2LMHeadModel, GPT2Tokenizer
import os
output_dir = '...' #local file or link to this page
tokenizer = GPT2Tokenizer.from_pretrained(output_dir)
model = TFGPT2LMHeadModel.from_pretrained(output_dir)

text = "Siekdamas"
# encoding the input text
input_ids = tokenizer.encode(text, return_tensors='tf')
# getting out output
beam_outputs = model.generate(
  input_ids,
  max_length = 150,
  num_beams = 5,
  temperature = 0.7,
  no_repeat_ngram_size=2,
  num_return_sequences=5
)

print(tokenizer.decode(beam_outputs[0]))
```