Edit model card

Table of Contents

  1. TL;DR
  2. Model Details
  3. Usage
  4. Uses
  5. Citation

TL;DR

This is a FLAN-T5-XXL model trained on ArtifactAI/arxiv-cs-ml-instruct-50k. This model is for research purposes only and should not be used in production settings.

Model Description

  • Model type: Language model
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Related Models: All FLAN-T5 Checkpoints

Usage

Find below some example scripts on how to use the model in transformers:

Using the Pytorch model


import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load peft config for pre-trained checkpoint etc.
peft_model_id = "ArtifactAI/flant5-xxl-math-full-training-run-one"
config = PeftConfig.from_pretrained(peft_model_id)

# load base LLM model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path,  load_in_8bit=True,  device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()


input_ids = tokenizer("What is the peak phase of T-eV?", return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens=1000, do_sample=True, top_p=0.9)

print(f"summary: {tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")

Training Data

The model was trained on ArtifactAI/arxiv-math-instruct-50k, a dataset of question/answer pairs. Questions are generated using the t5-base model, while the answers are generated using the GPT-3.5-turbo model.

Citation

@misc{flan-t5-xxl-arxiv-cs-ml-zeroshot-qa,
    title={flan-t5-xxl-arxiv-cs-ml-zeroshot-qa},
    author={Matthew Kenney},
    year={2023}
}
Downloads last month
6
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.