Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ tags:
|
|
| 21 |
|
| 22 |
Today, we are officially open-sourcing Ring-mini-linear-2.0.
|
| 23 |
|
| 24 |
-
This model continues to employ a hybrid architecture that combines linear attention and standard attention mechanisms, striking a balance between performance and efficiency. Inheriting the efficient MoE (Mixture-of-Experts) design from the Ling 2.0 series, and through architectural optimizations such as a 1/32 expert activation ratio and MTP layers, Ring-mini-linear achieves the performance of an ~8B dense model while activating only 1.4B of its 16B total parameters. This model was converted from [Ling-mini-base-2.0](https://huggingface.co/inclusionAI/Ling-mini-base-2.0-20T), continually trained on an additional 600B tokens. In terms of performance, the hybrid linear model is comparable in overall performance to standard attention models of a similar size (e.g., Ring-mini-2) and surpasses other open-source MoE and Dense models of the same class on several challenging benchmarks.
|
| 25 |
|
| 26 |
<div style="display: flex; justify-content: center;">
|
| 27 |
<div style="text-align: center;">
|
|
@@ -182,7 +182,7 @@ from vllm import LLM, SamplingParams
|
|
| 182 |
|
| 183 |
tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
|
| 184 |
|
| 185 |
-
sampling_params = SamplingParams(temperature=0.
|
| 186 |
|
| 187 |
llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False, max_num_seqs=128)
|
| 188 |
prompt = "Give me a short introduction to large language models."
|
|
|
|
| 21 |
|
| 22 |
Today, we are officially open-sourcing Ring-mini-linear-2.0.
|
| 23 |
|
| 24 |
+
This model continues to employ a hybrid architecture that combines linear attention and standard attention mechanisms, striking a balance between performance and efficiency. Inheriting the efficient MoE (Mixture-of-Experts) design from the Ling 2.0 series, and through architectural optimizations such as a 1/32 expert activation ratio and MTP layers, Ring-mini-linear achieves the performance of an ~8B dense model while activating only 1.4B of its 16B total parameters. This model was converted from [Ling-mini-base-2.0](https://huggingface.co/inclusionAI/Ling-mini-base-2.0-20T), continually trained on an additional 600B tokens. In terms of performance, the hybrid linear model is comparable in overall performance to standard attention models of a similar size (e.g., Ring-mini-2) and surpasses other open-source MoE and Dense models of the same class on several challenging benchmarks. Additionally, we support a 512k long context window, achieved by extrapolating the window 4x using YaRN. This provides superior speed, especially on tasks involving long inputs and outputs.
|
| 25 |
|
| 26 |
<div style="display: flex; justify-content: center;">
|
| 27 |
<div style="text-align: center;">
|
|
|
|
| 182 |
|
| 183 |
tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
|
| 184 |
|
| 185 |
+
sampling_params = SamplingParams(temperature=0.6, top_p=1.0, max_tokens=16384)
|
| 186 |
|
| 187 |
llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False, max_num_seqs=128)
|
| 188 |
prompt = "Give me a short introduction to large language models."
|