You agree to not use the models for any harmful, inappropriate, unethical or illegal purpose or intention.

SeaLLMs - Large Language Models for Southeast Asia

UPDATE: SeaLLM-7B-v2.5 is released with state-of-the-art performance in world knowledge and reasoning.

IMPORTANT: SeaLLM-7B-v2 is released with state-of-the-art performance, and this Llama-based SeaLLM-7B-v1 is DEPRECATED. Please use SeaLLM-7B-v2 instead.

This a 1st version of 7B SeaLLMs. It Vietnamese ๐Ÿ‡ป๐Ÿ‡ณ, Indonesian ๐Ÿ‡ฎ๐Ÿ‡ฉ, Thai ๐Ÿ‡น๐Ÿ‡ญ, Malay ๐Ÿ‡ฒ๐Ÿ‡พ, Khmer ๐Ÿ‡ฐ๐Ÿ‡ญ, Lao ๐Ÿ‡ฑ๐Ÿ‡ฆ, Tagalog ๐Ÿ‡ต๐Ÿ‡ญ and Burmese ๐Ÿ‡ฒ๐Ÿ‡ฒ. It have much lower capability than SeaLLM-7B-v2, so please use the SeaLLM-7B-v2 instead.

Terms of Use and License: By using our released weights, codes, and demos, you agree to and comply with the terms and conditions specified in our SeaLLMs Terms Of Use.

Disclaimer: We must note that even though the weights, codes, and demos are released in an open manner, similar to other pre-trained language models, and despite our best efforts in red teaming and safety fine-tuning and enforcement, our models come with potential risks, including but not limited to inaccurate, misleading or potentially harmful generation.

How to Run:

SeaLLM models work the same way as Llama-2, so the Llama-2 generation codebase should be sufficient to run. However, as this is a chat model, you should wrap the prompt/instruction using the following format function.

You should also turn off add_special_tokens with tokenizer.add_special_tokens = False.

BOS_TOKEN = '<s>'
EOS_TOKEN = '</s>'

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

SYSTEM_PROMPT = """You are a multilingual, helpful, respectful and honest assistant. \
Please always answer as helpfully as possible, while being safe. Your \
answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure \
that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not \
correct. If you don't know the answer to a question, please don't share false information.

As a multilingual assistant, you must respond and follow instructions in the native language of the user by default, unless told otherwise. \
Your response should adapt to the norms and customs of the respective language and culture.

def chat_multiturn_seq_format(
    message: str,
    history: List[Tuple[str, str]] = None, 
        <bos>[INST] B_SYS SytemPrompt E_SYS Prompt [/INST] Answer <eos>
        <bos>[INST] Prompt [/INST] Answer <eos>
        <bos>[INST] Prompt [/INST]
    As the format auto-add <bos>, please turn off add_special_tokens with `tokenizer.add_special_tokens = False`
      message: the current prompt
      history: list of list indicating previous conversation. [[message1, response1], [message2, response2]]
      full_prompt: the prompt that should go into the chat model

      full_prompt = chat_multiturn_seq_format("Hello world")
      output = model.generate(tokenizer.encode(full_prompt, add_special_tokens=False), ...)
    text = ''
    for i, (prompt, res) in enumerate(history):
        if i == 0:
            text += f"{bos_token}{B_INST} {B_SYS} {sys_prompt} {E_SYS} {prompt} {E_INST}"
            text += f"{bos_token}{B_INST} {prompt}{end_instr}"
        if res is not None:
            text += f" {res} {eos_token} "
    if len(history) == 0 or text.strip() == '':
        text = f"{bos_token}{B_INST} {B_SYS} {sys_prompt} {E_SYS} {message} {E_INST}"
        text += f"{bos_token}{B_INST} {message} {E_INST}"
    return text


If you find our project useful, we hope you would kindly star our repo and cite our work as follows: Corresponding Author: l.bing@alibaba-inc.com

  author = {Xuan-Phi Nguyen*, Wenxuan Zhang*, Xin Li*, Mahani Aljunied*,
            Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen Yang,
            Chaoqun Liu, Hang Zhang, Lidong Bing},
  title = {SeaLLMs - Large Language Models for Southeast Asia},
  year = 2023,
  Eprint = {arXiv:2312.00738},
