English
Edit model card

Dataset

This model was trained using the TinyStories dataset, specifically with the GPT-4 version.

The Model

The name "Decepticon" stems from the model's unique architecture, which combines elements of both Transformer and RNN architechtures. This fusion creates a deceptive yet beneficial design.

The model features a context length of 1024, but in theory, it can be extended indefinitely through fine-tuning.

rwkv-decepticon-char-20m.pth is to be used with vocab.json. This is a character level model.
n_layer: 6
n_embd: 512
ctx_len: 1024

rwkv-decepticon-books-140m.pth - trained on 20gb of books. Uses the 20B_tokenizer.json file.
n_layer: 8
n_embd: 768
ctx_len: 1024

rwkv-decepticon-70m.pth (coming soon) is to be used with 20B_tokenizer.json.
n_layer: 8
n_embd: 768
ctx_len: 1024

rwkv-decepticon-170m.pth (coming soon) is trained on a small subset of the SlimPajama dataset (6gb). This also uses the 20B_tokenizer.json file.
n_layer: 8
n_embd: 768
ctx_len: 1024

I would like to train a 7B parameter model but lack the compute required. If you would like to sponsor some compute, please contact me.

License

RWKV-Decepticon Large Language Model License

1. Definitions

    “Model” refers to the RWKV-Decepticon Large Language Model.
    “Individual User” refers to a single person using the Model for personal use.
    “Company” refers to any for-profit organization or entity using the Model for commercial purposes.
    “Non-Profit” refers to any non-profit organization that is not operating under a Company.
    “Research” refers to activities undertaken with the goal of discovering new knowledge or insights, testing theories or ideas, or analyzing data or existing information. It does not include activities where the primary purpose is commercial gain or profit.
    “Open Source” refers to something that is publicly accessible and can be used, modified, and shared by anyone. In the context of this license, it means that any modifications made to the Model must be publicly released and free for others to use, modify, and distribute.

2. Grant of License

    This license grants Individual Users, Non-Profits, and Research entities the right to use, modify, and distribute the Model in any way they see fit.
    Companies are granted the right to use, modify, and distribute the Model, but not for profit.

3. Conditions

    Any modifications made to the Model by Companies must be open source and distributed under this same license within one week of modification.
    Individual Users and Non-Profits not operating under a Company are not subject to this condition.

4. Profit Use

    Profit use by Companies includes any direct or indirect commercial use that could bring in money for the Company or aid them in bringing in money.

5. Research

    Research entities are allowed to use the Model for research purposes only and not for profit.

6. Violations and Termination

    Violation of these terms will result in immediate termination of this license.

7. Disclaimer

    The Model is provided “as is”, without warranty of any kind, express or implied.

8. Content Restrictions

    The Model shall not be used to generate, promote, or facilitate content related to child pornography (CP) or any other form of child exploitation. Using the Model to attempt to generate CP or engage in any activity related to child exploitation is strictly prohibited. Refusal to comply with these restrictions is a violation of this license and may result in termination of the license.

Thank you to the creators of RWKV who made all of this possible. Their repo is here: https://github.com/BlinkDL/RWKV-LM

Downloads last month
0
Unable to determine this model's library. Check the docs .

Dataset used to train jono1234/RWKV-Decepticon