AlgorithmicResearchGroup
/

flan-t5-xxl-arxiv-cs-ml-closed-qa

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

flan-t5-xxl-arxiv-cs-ml-closed-qa / README.md

ArtifactAI

Update README.md

ef2550e over 1 year ago

|

2.36 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: summarization
	widget:
	- text: What is an LSTM?
	example_title: Question Answering
	tags:
	- arxiv
	datasets:
	- ArtifactAI/arxiv-cs-ml-instruct-tune-50k
	---
	# Table of Contents

	0. [TL;DR](#TL;DR)
	1. [Model Details](#model-details)
	2. [Usage](#usage)
	3. [Uses](#uses)
	4. [Citation](#citation)

	# TL;DR

	This is a FLAN-T5-XXL model trained on [ArtifactAI/arxiv-cs-ml-instruct-50k](https://huggingface.co/datasets/ArtifactAI/arxiv-cs-ml-instruct-50k). This model is for research purposes only and *should not be used in production settings*.


	## Model Description


	- Model type: Language model
	- Language(s) (NLP): English
	- License: Apache 2.0
	- Related Models: [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)

	# Usage

	Find below some example scripts on how to use the model in `transformers`:

	## Using the Pytorch model

	```python

	import torch
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

	# Load peft config for pre-trained checkpoint etc.
	peft_model_id = "ArtifactAI/flant5-xxl-math-full-training-run-one"
	config = PeftConfig.from_pretrained(peft_model_id)

	# load base LLM model and tokenizer
	model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map={"":0})
	tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

	# Load the Lora model
	model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
	model.eval()


	input_ids = tokenizer("What is the peak phase of T-eV?", return_tensors="pt", truncation=True).input_ids.cuda()
	# with torch.inference_mode():
	outputs = model.generate(input_ids=input_ids, max_new_tokens=1000, do_sample=True, top_p=0.9)

	print(f"summary: {tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]}")
	```

	## Training Data

	The model was trained on [ArtifactAI/arxiv-math-instruct-50k](https://huggingface.co/datasets/ArtifactAI/arxiv-cs-ml-instruct-50k), a dataset of question/answer pairs. Questions are generated using the t5-base model, while the answers are generated using the GPT-3.5-turbo model.

	# Citation

	```
	@misc{flan-t5-xxl-arxiv-cs-ml-zeroshot-qa,
	title={flan-t5-xxl-arxiv-cs-ml-zeroshot-qa},
	author={Matthew Kenney},
	year={2023}
	}
	```