TinyMistral-248M-v2 / README.md
Locutusque's picture
Update README.md
c725fa0
|
raw
history blame
824 Bytes
metadata
license: cc-by-nc-4.0
language:
  - en
pipeline_tag: text-generation
datasets:
  - Skylion007/openwebtext
  - Locutusque/TM-DATA
inference:
  parameters:
    do_sample: true
    temperature: 0.7
    top_p: 0.2
    top_k: 7
    max_new_tokens: 250
    repetition_penalty: 1.15

Training

This model was trained on two datasets, shown in this model page.

  • Skylion007/openwebtext: 1,000,000 examples at a batch size of 32-4096 (1 epoch)
  • Locutusque/TM-DATA: All examples at a batch size of 12288 (3 epochs) Training took approximately 500 GPU hours on a single Titan V.

Metrics

You can look at the training metrics here: https://wandb.ai/locutusque/TinyMistral-V2/runs/g0rvw6wc

License

This model is released under the cc-by-nc-4.0 license. This is because the data used to train this model is under this same license.