Edit model card

Model Card for FastChat-T5 3B Q8

The model is quantized version of the lmsys/fastchat-t5-3b-v1.0 with int8 quantization.

Model Details

Model Description

The model being quantized using CTranslate2 with the following command:

ct2-transformers-converter --model lmsys/fastchat-t5-3b --output_dir lmsys/fastchat-t5-3b-ct2 --copy_files generation_config.json added_tokens.json tokenizer_config.json special_tokens_map.json spiece.model --quantization int8 --force --low_cpu_mem_usage

If you want to perform the quantization yourself, you need to install the following dependencies:

pip install -qU ctranslate2 transformers[torch] sentencepiece accelerate
  • Shared by: Lim Chee Kin
  • License: Apache 2.0

How to Get Started with the Model

Use the code below to get started with the model.

import ctranslate2
import transformers

translator = ctranslate2.Translator("limcheekin/fastchat-t5-3b-ct2")
tokenizer = transformers.AutoTokenizer.from_pretrained("limcheekin/fastchat-t5-3b-ct2")

input_text = "translate English to German: The house is wonderful."
input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

results = translator.translate_batch([input_tokens])

output_tokens = results[0].hypotheses[0]
output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))

print(output_text)

The code is taken from https://opennmt.net/CTranslate2/guides/transformers.html#t5.

The key method of the code above is translate_batch, you can find out its supported parameters here.

Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.