alexchen4ai
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
license: cc-by-nc-4.0
|
3 |
base_model: Qwen/Qwen2-7B-Instruct
|
4 |
model-index:
|
5 |
-
- name:
|
6 |
results: []
|
7 |
tags:
|
8 |
- RAG
|
@@ -14,7 +14,7 @@ spaces: false
|
|
14 |
language:
|
15 |
- en
|
16 |
---
|
17 |
-
#
|
18 |
|
19 |
<p align="center">
|
20 |
- <a href="https://www.nexaai.com/models" target="_blank">Nexa Model Hub</a>
|
@@ -26,7 +26,7 @@ language:
|
|
26 |
</p>
|
27 |
|
28 |
## Overview
|
29 |
-
|
30 |
- 🧠 Context as a distinct modality
|
31 |
- 🗜️ Language encoder for context compression
|
32 |
- 🔗 Multimodal techniques applied to language processing
|
@@ -34,7 +34,7 @@ Dolphin is a novel approach to accelerate language model inference by treating l
|
|
34 |
- 📜 Specialized for long context understanding
|
35 |
|
36 |
## Model Architecture
|
37 |
-
|
38 |
1. A smaller decoder (0.5B parameters) for transforming information from extensive contexts
|
39 |
2. A larger decoder (7B parameters) for comprehending and generating responses to current queries
|
40 |
3. The architecture also includes a projector to align embeddings between the text encoder and the main decoder.
|
@@ -131,7 +131,7 @@ If you use Dolphin in your research, please cite our paper:
|
|
131 |
|
132 |
```bibtex
|
133 |
@article{chen2024dolphinlongcontextnew,
|
134 |
-
title={
|
135 |
author={Wei Chen and Zhiyuan Li and Shuo Xin and Yihao Wang},
|
136 |
year={2024},
|
137 |
eprint={2408.15518},
|
|
|
2 |
license: cc-by-nc-4.0
|
3 |
base_model: Qwen/Qwen2-7B-Instruct
|
4 |
model-index:
|
5 |
+
- name: Squid
|
6 |
results: []
|
7 |
tags:
|
8 |
- RAG
|
|
|
14 |
language:
|
15 |
- en
|
16 |
---
|
17 |
+
# Squid: Long Context as a New Modality for on-device RAG
|
18 |
|
19 |
<p align="center">
|
20 |
- <a href="https://www.nexaai.com/models" target="_blank">Nexa Model Hub</a>
|
|
|
26 |
</p>
|
27 |
|
28 |
## Overview
|
29 |
+
Squid is a novel approach to accelerate language model inference by treating long context as a new modality, similar to image, audio, and video modalities in vision-language models. This innovative method incorporates a language encoder model to encode context information into embeddings, applying multimodal model concepts to enhance the efficiency of language model inference。 Below are model highlights:
|
30 |
- 🧠 Context as a distinct modality
|
31 |
- 🗜️ Language encoder for context compression
|
32 |
- 🔗 Multimodal techniques applied to language processing
|
|
|
34 |
- 📜 Specialized for long context understanding
|
35 |
|
36 |
## Model Architecture
|
37 |
+
Squid employs a decoder-decoder framework with two main components:
|
38 |
1. A smaller decoder (0.5B parameters) for transforming information from extensive contexts
|
39 |
2. A larger decoder (7B parameters) for comprehending and generating responses to current queries
|
40 |
3. The architecture also includes a projector to align embeddings between the text encoder and the main decoder.
|
|
|
131 |
|
132 |
```bibtex
|
133 |
@article{chen2024dolphinlongcontextnew,
|
134 |
+
title={Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models},
|
135 |
author={Wei Chen and Zhiyuan Li and Shuo Xin and Yihao Wang},
|
136 |
year={2024},
|
137 |
eprint={2408.15518},
|