|
--- |
|
license: mit |
|
tags: |
|
- natural-language-processing |
|
- code-generation |
|
- torch |
|
- lstm |
|
--- |
|
|
|
This generative text model was trained using [Andrej Karpathy's code](https://github.com/karpathy/char-rnn) on homeworks by [Linguistics students'](https://ling.hse.ru/en/) homework assignments for a beginning Python course of HSE University in 2017. |
|
|
|
Model was trained with size 512 and 3 layers, dropout 0.5. |
|
|
|
## Usage |
|
|
|
The procedure for installing the required software is described [by Karpathy](https://github.com/karpathy/char-rnn), torch is required, the code is written in lua. Be careful, versions of libraries written many years ago are used. |
|
|
|
```bash |
|
th sample.lua lm_lstm_epoch27.89_0.7387.t7 -length 10000 -temperature 0.5 -primetext 'some text' |
|
``` |
|
|
|
## Train data |
|
|
|
Train corpus consists of joined programms in to one file inclded in this repository as `input.txt` |
|
|
|
|
|
## What for? |
|
|
|
In an era of winning Transformers, ancient RNN models seem archaic. But I see that they still work better than modern architectures with such important categories from the humanities point of view as individual style. |
|
|
|
This model was created just or fun of students at the end of the course in 2017. |
|
|
|
## Samples |
|
|
|
### temperature 0.5 |
|
|
|
```python |
|
some text] and re.search('<meta content=\"(.*)\" name=\"author\"></meta>", oneline): |
|
for line in a: |
|
if re.search('<w><ana lex=\"(.+)\" gr=\".+"></ana>(.+?)</w>', line): |
|
s = re.search(reg_adj, line) |
|
if r: |
|
k = re.search('<meta content="(.+?)" name="author">', txt)) |
|
sentences = re.sub('</w>', '', s) |
|
with open('file.txt', 'a', encoding = 'utf-8') as f: |
|
f.write(i+' '+count_words(f) |
|
f.write('\n') |
|
f.write('Выполняется файлов в папке в нет |
|
можно сделеть слово слово в папка с цифрами в названии в папка с программой и папенается в тексте нет разной инит.') |
|
print('Творительный падеж, единственное число') |
|
elif word.endswith('ах') or word.endswith ('ям'): |
|
print('Poss |
|
``` |
|
|
|
### temperature 0.6 |
|
|
|
```python |
|
|
|
def noun_midles(words): |
|
print(result) |
|
def main(): |
|
print('В тексте нет попыгамителись попытка слов в препинания в ланное не равно киличество файлов (' + str(arr)) |
|
def main(): |
|
maxi = max_pmi_any(s, 'answ') |
|
print(count_form(textik, dictionary) |
|
def main(): |
|
forms = open_file() |
|
words = open_text(way_to_file) |
|
words = [] |
|
for i in range(len(forms)): |
|
if '.' |
|
words += word.strip('.,!?//()":;/|\)\'»\n\t ') |
|
reg_author = '<meta content="(.+?)" name="author"' |
|
bigrams.append(f +'\t'+str(pos[forms[i])+1 |
|
else: |
|
dic[file] = 1 |
|
else: |
|
d[key] = 1 |
|
else: |
|
dic[key] = 1 |
|
else: |
|
dic[lemmes[i]] += 1 |
|
return d |
|
def write_out_count_forms(text): |
|
arr = re.findall('<w>(.+?)</w>', text) |
|
return text |
|
def find_max(string, 'words_anes) |
|
|
|
``` |
|
|
|
### temperature 0.7 |
|
|
|
```python |
|
|
|
import re |
|
def main(): |
|
maxi = max(pmi) |
|
number = int(input('Введите слово: ') |
|
if os.path.isfile(f): |
|
for key in d: |
|
f.write(key + '\n') |
|
f.close() |
|
return |
|
def main(): |
|
text = text_process('text.txt') |
|
words = [] |
|
words = [] |
|
for word in words: |
|
word = word.strip('.,;:?!')) |
|
f.close() |
|
return forms |
|
def names_file(fname): |
|
with open (fname, 'r', encoding = 'utf-8') as f: |
|
text = f.read() |
|
return text |
|
def count_text(text): |
|
text2 = re.sub(u'<.*?></w>', text) |
|
return text |
|
def count_text(word, text): |
|
t = open_text(fname) |
|
return file |
|
def author('text.txt'): |
|
for i in range(len(reg)): |
|
forms[i] = words[i].strip('.,?!()*&^%$ |
|
file[i] = file[i].strip('.,?!()*&^%$ |
|
for k in range(len(list_)): |
|
if len(strings)>1: |
|
print('Олонаким препинания.html', 'a раздания') |
|
word=re.sub('<.*?>', '', word, text) |
|
|
|
|
|
``` |