rugpt-neo-1.3b / README.md
AlexWortega's picture
Update README.md
eaaa377
|
raw
history blame
1.37 kB
metadata
license: mit
datasets:
  - wikipedia
  - IlyaGusev/gazeta
language:
  - ru
library_name: transformers

ruGPT-Neo 1.3B [IN TRANING, NOT FINAL CHECKPOINT]

Model Description

ruGPT-Neo 1.3B is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. ruGPT-Neo refers to the class of models, while 1.3B represents the number of parameters of this particular pre-trained model.

Training procedure

This model was trained on the wiki, gazeta summorization, for 38k steps, on 4*v100 gpu, still training . It was trained as a masked autoregressive language model, using cross-entropy loss.

Intended Use and Limitations

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt.

How to use

You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run:

>>> from transformers import pipeline
>>> generator = pipeline('text-generation', model='AlexWortega/rugpt-neo-1.3b')
>>> generator("EleutherAI has", do_sample=True, min_length=50)

[{'generated_text': 'EleutherAI has made a commitment to create new software packages for each of its major clients and has'}]