gpt-for-est-large / README.md
mphi's picture
Update README.md
0c5560b
|
raw
history blame
1.04 kB
metadata
tags:
  - generated_from_trainer
model-index:
  - name: gpt-4-est-large
    results: []

gpt-4-est-large

A GPT model for Estonian (large-size), trained from scratch on 2.2 billion words (Estonian National Corpus + News Crawl + Common Crawl). Currently trained for 1 epoch (but already better than gpt-4-est-base :-) to be updated)

Colab demo

Format

For training data was prepended with a text domain tag, and it should be added as prefix when using the model: >general<, >web<, >news<, >doaj< and >wiki< (standing for general texts, web crawled texts, news, article abstracts and wikipedia texts). Use the prefixes like this, e.g: ">web< Kas tead, et".

Model details

  • num. of layers: 24
  • num. of heads: 24
  • embedding size: 1536
  • context size: 1024
  • total size: 723.58M params

Further details to be added soon.

Framework versions

  • Transformers 4.13.0.dev0
  • Pytorch 1.10.0+cu102
  • Datasets 1.15.1
  • Tokenizers 0.10.3