b1sheng commited on
Commit
f68b926
1 Parent(s): 0343b70

Update src/assets/text_content.py

Browse files
Files changed (1) hide show
  1. src/assets/text_content.py +11 -8
src/assets/text_content.py CHANGED
@@ -8,19 +8,22 @@ INTRODUCTION_TEXT = f"""
8
  The data on this page is sourced from a research paper. If you intend to use the data from this page, please remember to cite the following source: https://arxiv.org/abs/2303.07992
9
 
10
  We compare the current SOTA traditional KBQA models (fine-tuned (FT) and zero-shot (ZS)),
 
11
  LLMs in the GPT family, and Other Non-GPT LLM. In QALD-9 and LC-quad2, the evaluation metric used is F1, while other datasets use Accuracy (Exact match).
12
 
13
  """
14
 
15
  LLM_BENCHMARKS_TEXT = f"""
16
- ChatGPT is a powerful large language model (LLM) that
17
- covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is
18
- growing interest in exploring whether ChatGPT can replace traditional
19
- knowledge-based question answering (KBQA) models. Although there
20
- have been some works analyzing the question answering performance of
21
- ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model.
22
- In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Microsoft. We evaluate ChatGPT
23
- and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets.
 
 
24
  The total number of test cases is approximately 190,000.
25
 
26
  """
 
8
  The data on this page is sourced from a research paper. If you intend to use the data from this page, please remember to cite the following source: https://arxiv.org/abs/2303.07992
9
 
10
  We compare the current SOTA traditional KBQA models (fine-tuned (FT) and zero-shot (ZS)),
11
+
12
  LLMs in the GPT family, and Other Non-GPT LLM. In QALD-9 and LC-quad2, the evaluation metric used is F1, while other datasets use Accuracy (Exact match).
13
 
14
  """
15
 
16
  LLM_BENCHMARKS_TEXT = f"""
17
+ ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge.
18
+
19
+ Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models.
20
+
21
+ Although there have been some works analyzing the question answering performance of ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model.
22
+
23
+ In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Microsoft.
24
+
25
+ We evaluate ChatGPT and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets.
26
+
27
  The total number of test cases is approximately 190,000.
28
 
29
  """