Edit model card

Model Card for Model ID

This is a continue-pretrained version of Tinyllama tailored for traditional Chinese. The continue-pretraining dataset contains roughly 2B tokens.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

def generate_response(input):
    '''
    simple test for the model
    '''
    # tokenzize the input
    tokenized_input = tokenizer.encode_plus(input, return_tensors='pt').to(device)
    
    # generate the response
    outputs = model.generate(
        input_ids=tokenized_input['input_ids'], 
        attention_mask=tokenized_input['attention_mask'],
        pad_token_id=tokenizer.pad_token_id,
        do_sample=False,
        repetition_penalty=1.3,
        max_length=500
    )
    
    # decode the response
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

if __name__ == '__main__':
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model = AutoModelForCausalLM.from_pretrained("DavidLanz/Taiwan-tinyllama-v1.0-chat",device_map=device,torch_dtype=torch.bfloat16)
    tokenizer = AutoTokenizer.from_pretrained("DavidLanz/Taiwan-tinyllama-v1.0-chat")
    while(True):
        text = input("input a simple prompt:")
        print('System:', generate_response(text))

Using bfloat16, the VRAM required is around 3GB!!!

Downloads last month
1,987
Safetensors
Model size
1.1B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train DavidLanz/Taiwan-tinyllama-v1.0-chat