zihanliu commited on
Commit
26f4418
1 Parent(s): fd632a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -19,7 +19,7 @@ We introduce Llama3-ChatQA-1.5, which excels at conversational question answerin
19
  [Llama3-ChatQA-1.5-8B](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B)   [Evaluation Data](https://huggingface.co/datasets/nvidia/ChatRAG-Bench)   [Training Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data)   [Retriever](https://huggingface.co/nvidia/dragon-multiturn-query-encoder)
20
 
21
  ## Benchmark Results
22
- Results in ChatRAG Bench are as follows:
23
 
24
  | | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
25
  | -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
@@ -36,7 +36,7 @@ Results in ChatRAG Bench are as follows:
36
  | Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
37
  | Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
38
 
39
- Note that ChatQA-1.5 is built based on Llama-3 base model, and ChatQA-1.0 is built based on Llama-2 base model. ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ChatRAG can be found [here](https://huggingface.co/datasets/nvidia/ChatRAG-Bench).
40
 
41
 
42
  ## Prompt Format
 
19
  [Llama3-ChatQA-1.5-8B](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B)   [Evaluation Data](https://huggingface.co/datasets/nvidia/ChatRAG-Bench)   [Training Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data)   [Retriever](https://huggingface.co/nvidia/dragon-multiturn-query-encoder)
20
 
21
  ## Benchmark Results
22
+ Results in [ChatRAG Bench](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) are as follows:
23
 
24
  | | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
25
  | -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 
36
  | Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
37
  | Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
38
 
39
+ Note that ChatQA-1.5 is built based on Llama-3 base model, and ChatQA-1.0 is built based on Llama-2 base model. ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ChatRAG Bench can be found [here](https://huggingface.co/datasets/nvidia/ChatRAG-Bench).
40
 
41
 
42
  ## Prompt Format