File size: 6,056 Bytes
3384522
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9780e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3384522
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
library_name: transformers
base_model: bert-base-chinese
tags:
- generated_from_trainer
datasets:
- real-jiakai/chinese-squadv2
model-index:
- name: chinese_squadv2
  results: []
---

# bert-base-chinese-finetuned-squadv2

This model is a fine-tuned version of [bert-base-chinese](https://huggingface.co/bert-base-chinese) on the [Chinese SQuAD v2.0 dataset](https://huggingface.co/datasets/real-jiakai/chinese-squadv2).

## Model Description

This model is designed for Chinese question answering tasks, specifically for extractive QA where the answer must be extracted from a given context paragraph. It can handle both answerable and unanswerable questions, following the SQuAD v2.0 format.

Key features:
- Based on BERT-base Chinese architecture
- Supports both answerable and unanswerable questions
- Trained on Chinese question-answer pairs
- Optimized for extractive question answering

## Intended Uses & Limitations

### Intended Uses
- Chinese extractive question answering
- Reading comprehension tasks
- Information extraction from Chinese text
- Automated question answering systems

### Limitations
- Performance is significantly better on unanswerable questions (76.65% accuracy) compared to answerable questions (36.41% accuracy)
- Limited to extractive QA (cannot generate new answers)
- May not perform well on domain-specific questions outside the training data
- Designed for modern Chinese text, may not work well with classical Chinese or dialectal variations

## Training and Evaluation Data

The model was trained on the Chinese SQuAD v2.0 dataset, which contains:

Training Set:
- Total examples: 90,027
- Answerable questions: 46,529
- Unanswerable questions: 43,498

Validation Set:
- Total examples: 9,936
- Answerable questions: 3,991
- Unanswerable questions: 5,945

## Training Procedure

### Training Hyperparameters

- Learning rate: 3e-05
- Batch size: 12
- Evaluation batch size: 8
- Number of epochs: 5
- Optimizer: AdamW (β1=0.9, β2=0.999, ε=1e-08)
- Learning rate scheduler: Linear
- Maximum sequence length: 384
- Document stride: 128
- Training device: CUDA-enabled GPU

### Training Results

Final evaluation metrics:
- Overall Exact Match: 60.49%
- Overall F1 Score: 60.54%
- Answerable Questions:
  - Exact Match: 36.41%
  - F1 Score: 36.53%
- Unanswerable Questions:
  - Exact Match: 76.65%
  - F1 Score: 76.65%

### Framework Versions
- Transformers: 4.47.0.dev0
- PyTorch: 2.5.1+cu124
- Datasets: 3.1.0
- Tokenizers: 0.20.3

## Usage

```python
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "real-jiakai/bert-base-chinese-finetuned-squadv2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

def get_answer(question, context, threshold=0.0):
    # Tokenize input with maximum sequence length of 384
    inputs = tokenizer(
        question,
        context,
        return_tensors="pt",
        max_length=384,
        truncation=True
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        start_logits = outputs.start_logits[0]
        end_logits = outputs.end_logits[0]
        
        # Calculate null score (score for predicting no answer)
        null_score = start_logits[0].item() + end_logits[0].item()
        
        # Find the best non-null answer, excluding [CLS] position
        # Set logits at [CLS] position to negative infinity
        start_logits[0] = float('-inf')
        end_logits[0] = float('-inf')
        
        start_idx = torch.argmax(start_logits)
        end_idx = torch.argmax(end_logits)
        
        # Ensure end_idx is not less than start_idx
        if end_idx < start_idx:
            end_idx = start_idx
            
        answer_score = start_logits[start_idx].item() + end_logits[end_idx].item()
        
        # If null score is higher (beyond threshold), return "no answer"
        if null_score - answer_score > threshold:
            return "Question cannot be answered based on the given context."
            
        # Otherwise, return the extracted answer
        tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
        answer = tokenizer.convert_tokens_to_string(tokens[start_idx:end_idx+1])
        
        # Check if answer is empty or contains only special tokens
        if not answer.strip() or answer.strip() in ['[CLS]', '[SEP]']:
            return "Question cannot be answered based on the given context."
            
        return answer.strip()

questions = [
    "本届第十五届珠海航展的亮点和主要展示内容是什么?",
    "珠海杀人案发生地点?"
]

context = '第十五届中国国际航空航天博览会(珠海航展)于2024年11月12日至17日在珠海国际航展中心举行。本届航展吸引了来自47个国家和地区的超过890家企业参展,展示了涵盖"陆、海、空、天、电、网"全领域的高精尖展品。其中,备受瞩目的中国空军"八一"飞行表演队和"红鹰"飞行表演队,以及俄罗斯"勇士"飞行表演队同台献技,为观众呈现了精彩的飞行表演。此外,本届航展还首次开辟了无人机、无人船演示区,展示了多款前沿科技产品。'

for question in questions:
    answer = get_answer(question, context)
    print(f"问题: {question}")
    print(f"答案: {answer}")
    print("-" * 50)
```

## Limitations and Bias

The model shows significant performance disparity between answerable and unanswerable questions, which might indicate:
1. Dataset quality issues
2. Potential translation artifacts in the Chinese version of SQuAD
3. Imbalanced handling of answerable vs. unanswerable questions

## Ethics & Responsible AI

Users should be aware that:
- The model may reflect biases present in the training data
- Performance varies significantly based on question type
- Results should be validated for critical applications
- The model should not be used as the sole decision-maker in critical systems