Edit model card

This model is created by finetuning EleutherAI/pythia-1.4b-deduped on the Dahoas/synthetic-instruct-gptj-pairwise.

You can try a demo of the model hosted on Lambda Cloud.

Model Details

Prerequisites

Running inference with the model takes ~4GB of GPU memory.

Quick Start

import torch

from transformers import AutoTokenizer, pipeline, StoppingCriteria, StoppingCriteriaList

device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

model_name = "lambdalabs/pythia-1.4b-deduped-synthetic-instruct"
max_new_tokens = 2048
stop_token = "<|stop|>"


class KeywordsStoppingCriteria(StoppingCriteria):
    def __init__(self, keywords_ids: list):
        self.keywords = keywords_ids

    def __call__(
        self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs
    ) -> bool:
        if input_ids[0][-1] in self.keywords:
            return True
        return False


tokenizer = AutoTokenizer.from_pretrained(
    model_name,
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_tokens([stop_token])

stop_ids = [tokenizer.encode(w)[0] for w in [stop_token]]
stop_criteria = KeywordsStoppingCriteria(stop_ids)

generator = pipeline(
    "text-generation",
    model=model_name,
    device=device,
    max_new_tokens=max_new_tokens,
    torch_dtype=torch.float16,
    stopping_criteria=StoppingCriteriaList([stop_criteria]),
)

example = "Can you give me some tips on how to save money every month."
text = "Question: {}\nAnswer:".format(example)

result = generator(
    text,
    num_return_sequences=1,
)

output = result[0]["generated_text"]

print(output)

Output:

Question: Can you give me some tips on how to save money every month.
Answer:Create a budget and track your spending.
2. Cut down on unnecessary expenses, such as eating out, shopping, and entertainment.
3. Make a list of your monthly expenses and stick to it.
4. Take advantage of discounts and coupons when shopping.
5. Make sure to pay your bills on time to avoid late fees.
6. Save a portion of your income each month by investing it in a high-yield savings account.
7. Consider automating your savings by setting up a recurring transfer from your checking to a savings account.
8. Take advantage of free entertainment opportunities, such as going to the park or museum.
9. Look for ways to save on utilities, such as installing energy-efficient appliances.
10. Research and use public transportation to save on gas.<|stop|>

Training

The model was trained on the Dahoas/synthetic-instruct-gptj-pairwise. We split the original dataset into the train (first 32000 examples) and validation (the remaining 1144 examples) subsets.

We finetune the model for 4 epoches. This took 8xA100 80GB 2 hours, where we set batch_size_per_gpu to 8 (so global batch size is 64), and learning rate to 0.00002 (with linear decay to zero at the last trainig step). You can find a Weights and Biases record here.

Downloads last month
827
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train lambdalabs/pythia-1.4b-deduped-synthetic-instruct