Edit model card

Quant Infos

  • quants done with an importance matrix for improved quantization loss
  • gguf & imatrix generated from bf16 for "optimal" accuracy loss (some say this is snake oil, but it can't hurt)
  • Wide coverage of different gguf quant types from Q_8_0 down to IQ1_S
  • Quantized with llama.cpp commit dc685be46622a8fabfd57cfa804237c8f15679b8 (master as of 2024-05-12)
  • Imatrix generated with this multi-purpose dataset.
    ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
    

Original Model Card:

πŸ™ GitHub β€’ πŸ‘Ύ Discord β€’ 🐀 Twitter β€’ πŸ’¬ WeChat
πŸ“ Paper β€’ πŸ™Œ FAQ β€’ πŸ“— Learning Hub

Intro

Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.

Compared with Yi, Yi-1.5 delivers stronger performance in coding, math, reasoning, and instruction-following capability, while still maintaining excellent capabilities in language understanding, commonsense reasoning, and reading comprehension.

Model Context Length Pre-trained Tokens
Yi-1.5 4K 3.6T

Models

Benchmarks

  • Chat models

    Yi-1.5-34B-Chat is on par with or excels beyond larger models in most benchmarks.

    image/png

    Yi-1.5-9B-Chat is the top performer among similarly sized open-source models.

    image/png

  • Base models

    Yi-1.5-34B is on par with or excels beyond larger models in some benchmarks.

    image/png

    Yi-1.5-9B is the top performer among similarly sized open-source models.

    image/png

Quick Start

For getting up and running with Yi-1.5 models quickly, see README.

Downloads last month
164
GGUF
Model size
6.06B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Quantized from