root
commited on
Commit
·
2cb277a
1
Parent(s):
25b631f
update README
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ tags:
|
|
13 |
|
14 |
|
15 |
## Model Details
|
16 |
-
We introduce Llama3-ChatQA-2, which bridges the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. Llama3-ChatQA-2 is developed using an improved training recipe from [ChatQA-1.5 paper](https://arxiv.org/pdf/2401.10225), and it is built on top of [Llama-3 base model](https://huggingface.co/meta-llama/Meta-Llama-3-70B). Specifically, we continued training of Llama-3 base models to extend the context window from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model’s instruction-following, RAG performance, and long-context understanding capabilities. Llama3-ChatQA-2 has two variants: Llama3-ChatQA-2-8B and Llama3-ChatQA-2-70B. Both models were originally trained using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), we converted the checkpoints to Hugging Face format. **For more information about ChatQA, check the [website](https://
|
17 |
|
18 |
## Other Resources
|
19 |
[Llama3-ChatQA-2-70B](https://huggingface.co/nvidia/Llama3-ChatQA-2-70B)   [Evaluation Data](https://huggingface.co/datasets/nvidia/ChatRAG-Bench)   [Training Data](https://huggingface.co/datasets/nvidia/ChatQA2-Training-Data)   [Retriever](https://huggingface.co/intfloat/e5-mistral-7b-instruct)   [Website](https://chatqa2-project.github.io/)   [Paper](https://arxiv.org/abs/2407.14482)
|
@@ -22,7 +22,7 @@ We introduce Llama3-ChatQA-2, which bridges the gap between open-source LLMs and
|
|
22 |
Results in [ChatRAG Bench](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) are as follows:
|
23 |
|
24 |
|
25 |
-
![Example Image](
|
26 |
| | ChatQA-2-70B | GPT-4-Turbo-2024-04-09 | Qwen2-72B-Instruct | Llama3.1-70B-Instruct |
|
27 |
| -- |:--:|:--:|:--:|:--:|
|
28 |
| Ultra-long (4k) | 41.04 | 33.16 | 39.77 | 39.81 |
|
@@ -136,3 +136,4 @@ Peng Xu (pengx@nvidia.com), Wei Ping (wping@nvidia.com)
|
|
136 |
|
137 |
## License
|
138 |
The use of this model is governed by the [META LLAMA 3 COMMUNITY LICENSE AGREEMENT](https://llama.meta.com/llama3/license/)
|
|
|
|
13 |
|
14 |
|
15 |
## Model Details
|
16 |
+
We introduce Llama3-ChatQA-2, which bridges the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. Llama3-ChatQA-2 is developed using an improved training recipe from [ChatQA-1.5 paper](https://arxiv.org/pdf/2401.10225), and it is built on top of [Llama-3 base model](https://huggingface.co/meta-llama/Meta-Llama-3-70B). Specifically, we continued training of Llama-3 base models to extend the context window from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model’s instruction-following, RAG performance, and long-context understanding capabilities. Llama3-ChatQA-2 has two variants: Llama3-ChatQA-2-8B and Llama3-ChatQA-2-70B. Both models were originally trained using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), we converted the checkpoints to Hugging Face format. **For more information about ChatQA 2, check the [website](https://chatqa2-project.github.io/)!**
|
17 |
|
18 |
## Other Resources
|
19 |
[Llama3-ChatQA-2-70B](https://huggingface.co/nvidia/Llama3-ChatQA-2-70B)   [Evaluation Data](https://huggingface.co/datasets/nvidia/ChatRAG-Bench)   [Training Data](https://huggingface.co/datasets/nvidia/ChatQA2-Training-Data)   [Retriever](https://huggingface.co/intfloat/e5-mistral-7b-instruct)   [Website](https://chatqa2-project.github.io/)   [Paper](https://arxiv.org/abs/2407.14482)
|
|
|
22 |
Results in [ChatRAG Bench](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) are as follows:
|
23 |
|
24 |
|
25 |
+
![Example Image](overview.png)
|
26 |
| | ChatQA-2-70B | GPT-4-Turbo-2024-04-09 | Qwen2-72B-Instruct | Llama3.1-70B-Instruct |
|
27 |
| -- |:--:|:--:|:--:|:--:|
|
28 |
| Ultra-long (4k) | 41.04 | 33.16 | 39.77 | 39.81 |
|
|
|
136 |
|
137 |
## License
|
138 |
The use of this model is governed by the [META LLAMA 3 COMMUNITY LICENSE AGREEMENT](https://llama.meta.com/llama3/license/)
|
139 |
+
Also it is Non-Commercial License
|