English

0.8M email generator experimental model

Inspired from 28m model email experiment, it's a post made on r/LocalLLaMA on reddit.

Trained on email-datasets-20k. This is the exact dataset which was used in that experiment.

How to use?

  1. Make sure you have uv installed have downloaded the repo, email.strawberry & cl8k.bin.

  2. email.strawberry is the model and cl8k.bin is the tokenizer.

  3. uv run src/sample.py -h will give you this output:

usage: sample.py [-h] --model MODEL --encoder ENCODER [--length LENGTH] [--temperature TEMPERATURE] [--top_k TOP_K]
                 [--text_prompt TEXT_PROMPT]

A powerful text encryption and decryption program.

options:
  -h, --help            show this help message and exit
  --model, -i MODEL     model path
  --encoder, -e ENCODER
                        encoder path
  --length, -l LENGTH   output length
  --temperature, -t TEMPERATURE
                        output temperature
  --top_k, -f TOP_K     output top_k
  --text_prompt, -T TEXT_PROMPT
                        Text input from the command line.
  1. uv run src/sample.py -i email.strawberry -e cl8k.bin -T "Write a" will you output something like this:
> Write a
Write a apologetic and humble business email from a Legal Counsel to a Recruiter regarding proposing a joint webinar, specifically while the system is partially down.<|eop|>
Regarding Recent Server Outage - [Company Name] and Integration
Dear [Recruiter Name],

I am writing to sincerely apologize for the recent server outage that occurred during the [Hackathon Name] hackathon. We recently implemented [mention the issue].

Due to this Serruption, we’ve been working diligently to mitigate this dissemination of our system, which has unfortunately impacted our ability to report this unacceptable correcting. Recognizing the significantly impacting [Industry Crisis], a broader experience experienced an unforeseen issue impacting the scale of this contract. We understand the enthusiasm and the demands of this investment.

We’ve already implemented a minor absence to process the potential disruption and a deepfaken
  1. out.txt contains the training logs.

Architecture design.

Input tokens
    |
[Token Embedding]
    |
[2x Strawberry Blocks] # 2 layers
    |--- Scaled Dot Product Attention
    |    |--- Rotary Positional Embeddings
    |    |--- QK Norm
    |    |--- Multi-Headed Attention
    |--- SiLU non-linearity
    |--- Scaled Dot Product Attention
    |    |--- Rotary Positional Embeddings
    |    |--- QK Norm
    |    |--- Multi-Headed Attention
    |    |
[Output Projection (weight-tied)]
    |
Next token logits
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Srijan-Srivastava/Strawberry-email