Edit model card

TinyStories-3M-val-Hebrew

This model is trained upon Norod78/TinyStoriesV2-GPT4-valid_heb-lineByLine-EoT

Dataset is a machine translation of TinyStoriesV2-GPT4-valid.txt by roneneldan

Trasnlation was done using this script

Original Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.

Model description

A very very small model (8M params) tarined on a very small dataset

A sample inference script is available

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 24
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 300.0

Framework versions

  • Transformers 4.31.0.dev0

  • Pytorch 2.0.0

  • Datasets 2.13.1

  • Tokenizers 0.13.3

  • Parameter calculation

  def gpt_params(seq_len, vocab_size, d_model, num_heads, num_layers):
    """ Given GPT config calculate total number of parameters """
    ffw_size = 4*d_model # in GPT the number of intermediate features is always 4*d_model
    # token and position embeddings
    embeddings = d_model * vocab_size + d_model * seq_len
    # transformer blocks
    attention = 3*d_model**2 + 3*d_model # weights and biases
    attproj = d_model**2 + d_model
    ffw = d_model*(ffw_size) + ffw_size
    ffwproj = ffw_size*d_model + d_model
    layernorms = 2*2*d_model
    # dense
    ln_f = 2*d_model
    dense = d_model*vocab_size # note: no bias here
    # note: embeddings are not included in the param count!
    total_params = num_layers*(attention + attproj + ffw + ffwproj + layernorms) + ln_f + dense
    return total_params

#gpt2 = dict(seq_len = 1024, vocab_size = 50257, d_model = 768, num_heads = 12, num_layers = 12)
gpt2 = dict(seq_len = 256, vocab_size = 50259, d_model = 128, num_heads = 16, num_layers = 8)
result = gpt_params(**gpt2)/1e6
print(result) #Prints 8.019584
Downloads last month
13
Safetensors
Model size
41.8M params
Tensor type
F32
·
BOOL
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Norod78/TinyStories-3M-val-Hebrew