README.md · URP/urllm-ko_en-2.7b at 1a93731f0a63909f797dc69506e988829e932c9a

metadata

license: cc-by-sa-4.0
language:
  - ko
  - en
pipeline_tag: text-generation
tags:
  - meta
  - llama-2
  - llama-2-ko-en
  - sheared llama

Model Details

Model Architecture:

urLLM-KO_EN-2.7B is an auto-regressive language model that leverages an optimized transformer architecture derived from princeton-nlp/Sheared-LLaMA-2.7B.

Training Corpus

The model was trained using selected datasets from Modu Corpus, Korean Wikipedia and Kaggle English News (approximately total 36GB).

Vocab Expansion

The expanded vocab size is 51385.