File size: 4,325 Bytes
927dc71
 
 
 
 
 
 
 
 
dfa1a56
927dc71
 
 
 
 
 
 
 
9bf3981
927dc71
 
 
 
 
 
 
 
 
 
 
dfa1a56
927dc71
9bf3981
927dc71
9bf3981
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: mit
tags:
  - natural-language-processing
  - code-generation
  - torch
  - lstm
---

This generative text model was trained using [Andrej Karpathy's code](https://github.com/karpathy/char-rnn) on homeworks by [Linguistics students'](https://ling.hse.ru/en/) homework assignments for a beginning Python course of HSE University in 2017.

Model was trained with size 512 and 3 layers, dropout 0.5.

## Usage

The procedure for installing the required software is described [by Karpathy](https://github.com/karpathy/char-rnn), torch is required, the code is written in lua. Be careful, versions of libraries written many years ago are used.

```bash
th sample.lua lm_lstm_epoch27.89_0.7387.t7 -length 10000 -temperature 0.5 -primetext 'some text' 
```

## Train data

Train corpus consists of joined programms in to one file inclded in this repository as `input.txt`


## What for?

In an era of winning Transformers, ancient RNN models seem archaic. But I see that they still work better than modern architectures with such important categories from the humanities point of view as individual style.

This model was created just or fun of students at the end of the course in 2017.

## Samples

### temperature 0.5

```python
some text] and re.search('<meta content=\"(.*)\" name=\"author\"></meta>", oneline):
        for line in a:
            if re.search('<w><ana lex=\"(.+)\" gr=\".+"></ana>(.+?)</w>', line):
                    s = re.search(reg_adj, line)
                    if r:
                    k = re.search('<meta content="(.+?)" name="author">', txt))
                        sentences = re.sub('</w>', '', s)
                        with open('file.txt', 'a', encoding = 'utf-8') as f:
                        f.write(i+' '+count_words(f)
                f.write('\n')
            f.write('Выполняется файлов в папке в нет
можно сделеть слово слово в папка с цифрами в названии в папка с программой и папенается в тексте нет разной инит.')
print('Творительный падеж, единственное число')
elif word.endswith('ах') or word.endswith  ('ям'):
        print('Poss
```

### temperature 0.6

```python

def noun_midles(words):
    print(result)
def main():
    print('В тексте нет попыгамителись попытка слов в препинания в ланное не равно киличество файлов (' + str(arr))
def main():
    maxi = max_pmi_any(s, 'answ')
    print(count_form(textik, dictionary)
def main():
    forms = open_file()
    words = open_text(way_to_file)
    words = []
    for i in range(len(forms)):
        if '.'
            words += word.strip('.,!?//()":;/|\)\'»\n\t ')
                reg_author = '<meta content="(.+?)" name="author"'
                            bigrams.append(f +'\t'+str(pos[forms[i])+1
                    else:
                        dic[file] = 1
            else:
                d[key] = 1
            else:
                dic[key] = 1
        else:
            dic[lemmes[i]] += 1
    return d
def write_out_count_forms(text):
    arr = re.findall('<w>(.+?)</w>', text)
    return text
def find_max(string, 'words_anes)

```

### temperature 0.7

```python

import re
def main():
    maxi = max(pmi)
    number = int(input('Введите слово: ')
    if os.path.isfile(f):
        for key in d:
            f.write(key + '\n')
    f.close()
        return
def main():
    text = text_process('text.txt')
    words = []
    words = []
    for word in words:
        word = word.strip('.,;:?!'))
    f.close()
    return forms
def names_file(fname):
    with open (fname, 'r', encoding = 'utf-8') as f:
        text = f.read()
    return text
def count_text(text):
    text2 = re.sub(u'<.*?></w>', text)
    return text
def count_text(word, text):
    t = open_text(fname)
    return file
def author('text.txt'):
    for i in range(len(reg)):
        forms[i] = words[i].strip('.,?!()*&^%$
        file[i] = file[i].strip('.,?!()*&^%$
        for k in range(len(list_)):
            if len(strings)>1:
                print('Олонаким препинания.html', 'a раздания')
                word=re.sub('<.*?>', '', word, text)


```