Edit model card

Llama-3-MAAL-8B-Instruct-v0.1

we release MAAL, Multilingual Adaptive Augmentation Language-model which comprises a groundbreaking fusion of multilingual capabilities and adaptive augmentation techniques.

  • Developed by: maum.ai Brain NLP. Jaeyoon Jung, Jinjoo Lee, Yongjae Lee, Dongjun Lee, Woosung Joo
  • Language(s) (NLP): Korean, English (currently, bilingual)

Model Description

Version 0.1 uses cross-lingual training to transfer instruction-following capabilities from English to Korean.

  • We Trained this model on an 8 H100-80G for 1 day with cross-lingual training dataset
  • we recommend using the fixed system prompt for the model unless you fine-tune it
๋„ˆ๋Š” ๋งˆ์Œ์—์ด์•„์ด์˜ ์ฑ—๋ด‡ MAAL์ด๋‹ค. ๊ณ ๊ฐ์˜ ์งˆ๋ฌธ์— ์นœ์ ˆํ•˜๊ฒŒ ๋‹ตํ•˜์—ฌ๋ผ.

sample inference code (GPU)

import transformers
import torch
model_id = "maum-ai/Llama-3-MAAL-8B-Instruct-v0.1"
model = transformers.AutoModelForCausalLM.from_pretrained(model_id).to("cuda")
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
streamer = transformers.TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# we recommend using the fixed prompt for the model unless you fine-tune it
prompt = "๋„ˆ๋Š” ๋งˆ์Œ์—์ด์•„์ด์˜ ์ฑ—๋ด‡ MAAL์ด๋‹ค. ๊ณ ๊ฐ์˜ ์งˆ๋ฌธ์— ์นœ์ ˆํ•˜๊ฒŒ ๋‹ตํ•˜์—ฌ๋ผ."
instruction = "์‚ฌ๊ณผ ํ•œ ๋ฐ•์Šค์—๋Š” ์‚ฌ๊ณผ๊ฐ€ 30๊ฐœ ๋“ค์–ด์žˆ๋Š”๋ฐ, ์ฒ˜์Œ์—๋Š” ์‚ฌ๊ณผ 3๋ฐ•์Šค๊ฐ€ ์žˆ์—ˆ๊ณ , ๋‚ด๊ฐ€ ์‚ฌ๊ณผ 5๊ฐœ๋ฅผ ๋จน์—ˆ์–ด. ๋‚จ์€ ์‚ฌ๊ณผ๋Š” ์ด ๋ช‡๊ฐœ์•ผ?"
messages = [
    {"role": "system", "content": f"{prompt}"},
    {"role": "user", "content": f"{instruction}"}
    ]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors='pt').to("cuda")
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=1024, pad_token_id=tokenizer.eos_token_id)

Evaluation Results

As the main goal of version 0.1 is to transfer instruction-following capabilities from English to Korean without utilizing continuous pre-training, etc., we select LogicKor as our evaluation method to assess the Korean instruction skills.

We compare our model with a similar parameter model (less than 13B) that has been fine-tuned on the Korean dataset. * denotes our self-report result.

Model single-turn(โ†‘) multi-turn(โ†‘) average(โ†‘)
maum-ai/Llama-3-MAAL-8B-Instruct-v0.1* 5.80 4.66 5.23
maywell/Synatra-kiqu-10.7B 5.71 4.73 5.22
yanolja/EEVE-Korean-Instruct-10.8B-v1.0 5.78 3.92 4.85
nlpai-lab/KULLM3 4.61 4.83 4.72
MLP-KTLim/llama3-Bllossom* 2.11 1.57 1.84

Limitations

Due to this model being trained on a small dataset, it has several limitations.

  • Hard to generate diverse Korean texts
  • lack of Korean knowledge & Culture (localization)
  • Not work with Image inputs and video inputs

Todo

we will solve these limitations one by one by upgrading this model like as...

  • Enhance the Korean generation through Vocabulary Expansion & Continuous pre-training. (more Korean corpus!)
  • Localize with cultural adaptation method and additional Korean knowledge data. similar idea
  • Develop a Vision Language Model that can handle both video and image inputs. similar idea
Downloads last month
73
GGUF
Model size
8.03B params
Architecture
llama
Unable to determine this model's library. Check the docs .

Quantized from