Edit model card

FuseLLM for Korean

Model Description

FuseLLM is a project that aims to combine the knowledge and strengths of large language models with different transformer architectures into a single model.

The source models used for adapting this idea into Korean in this repository are Orion (Base), OPEN-SOLAR-KO-10.7B, and Yi-Ko-6B (Sources).

Model Architecture

The target model architecture is based on the Llama model. The fusion process involves extracting representations from the source models, aligning the token embeddings of the source models with the target model's token space, and training the target model using the aligned representations.

Training Data

The model was trained on a dataset consisting of various text samples. For each text sample, inference was performed using the source models, and the top-k logit values output by the models for each token were collected and stored.

Training Procedure

  1. Representation Extraction: For each text sample in the dataset, inference is performed using the source models. The top-k logit values output by the model for each token are collected and stored.

  2. Token Alignment: The token embeddings of the source models and the target model (Llama) are compared, and the most similar tokens are mapped to each other. This process normalizes the token representations of different models into the token space of the Llama model.

  3. Model Training: The aligned representations are then used to train the target model.

Training Infrastructure

The model was trained using 8 A100 GPUs provided by Sionic AI.

Evaluation

Evaluation Datasets

The performance of the base Orion model and the fused model (Orion + OPEN-SOLAR-KO-10.7B + Yi-Ko-6B) was evaluated on the following datasets:

  • kobest_boolq
  • kobest_hellaswag
  • korunsmile
  • nsmc

Evaluation Results

The evaluation was conducted in both zero-shot and five-shot settings. The results are as follows:

Dataset Orion Base
(0 shot)
Fuse 0 shot
(Orion + OPEN-SOLAR-KO-10.7B + Yi-Ko-6B)
Orion Base
(5 Shot)
Fuse 5 shot
(Orion + OPEN-SOLAR-KO-10.7B + Yi-Ko-6B)
kobest_boolq 0.7642 0.7022 0.9017 0.8924
kobest_hellaswag 0.4840 0.5080 0.5060 0.5080
korunsmile 0.3694 0.3941 0.3562 0.3570
nsmc 0.5574 0.5803 0.8692 0.8690

The results demonstrate the effectiveness of the fusion approach in combining the strengths of different language models.

Intended Use

FuseLLM is intended to be used as a research model to explore the fusion of different language models and investigate the potential benefits of combining their knowledge and strengths.

Limitations and Bias

As with any language model, FuseLLM may exhibit biases present in the training data. The model's performance may also be limited by the quality and diversity of the training data used. Further analysis is required to understand and mitigate potential biases and limitations.

Acknowledgments

We would like to thank Sionic AI (https://sionic.ai) for providing the computational A100 x8 resources needed for training the FuseLLM model.

Downloads last month
18
Safetensors
Model size
14.1B params
Tensor type
F32
·