---
library_name: transformers
base_model: bert-base-chinese
tags:
- generated_from_trainer
datasets:
- real-jiakai/chinese-squadv2
model-index:
- name: chinese_squadv2
  results: []
---

# bert-base-chinese-finetuned-squadv2

This model is a fine-tuned version of [bert-base-chinese](https://huggingface.co/bert-base-chinese) on the [Chinese SQuAD v2.0 dataset](https://huggingface.co/datasets/real-jiakai/chinese-squadv2).

## Model Description

This model is designed for Chinese question answering tasks, specifically for extractive QA where the answer must be extracted from a given context paragraph. It can handle both answerable and unanswerable questions, following the SQuAD v2.0 format.

Key features:
- Based on BERT-base Chinese architecture
- Supports both answerable and unanswerable questions
- Trained on Chinese question-answer pairs
- Optimized for extractive question answering

## Intended Uses & Limitations

### Intended Uses
- Chinese extractive question answering
- Reading comprehension tasks
- Information extraction from Chinese text
- Automated question answering systems

### Limitations
- Performance is significantly better on unanswerable questions (76.65% accuracy) compared to answerable questions (36.41% accuracy)
- Limited to extractive QA (cannot generate new answers)
- May not perform well on domain-specific questions outside the training data
- Designed for modern Chinese text, may not work well with classical Chinese or dialectal variations

## Training and Evaluation Data

The model was trained on the Chinese SQuAD v2.0 dataset, which contains:

Training Set:
- Total examples: 90,027
- Answerable questions: 46,529
- Unanswerable questions: 43,498

Validation Set:
- Total examples: 9,936
- Answerable questions: 3,991
- Unanswerable questions: 5,945

## Training Procedure

### Training Hyperparameters

- Learning rate: 3e-05
- Batch size: 12
- Evaluation batch size: 8
- Number of epochs: 5
- Optimizer: AdamW (β1=0.9, β2=0.999, ε=1e-08)
- Learning rate scheduler: Linear
- Maximum sequence length: 384
- Document stride: 128
- Training device: CUDA-enabled GPU

### Training Results

Final evaluation metrics:
- Overall Exact Match: 60.49%
- Overall F1 Score: 60.54%
- Answerable Questions:
  - Exact Match: 36.41%
  - F1 Score: 36.53%
- Unanswerable Questions:
  - Exact Match: 76.65%
  - F1 Score: 76.65%

### Framework Versions
- Transformers: 4.47.0.dev0
- PyTorch: 2.5.1+cu124
- Datasets: 3.1.0
- Tokenizers: 0.20.3

## Usage

```python
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "real-jiakai/bert-base-chinese-finetuned-squadv2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

def get_answer(question, context, threshold=0.0):
    # Tokenize input with maximum sequence length of 384
    inputs = tokenizer(
        question,
        context,
        return_tensors="pt",
        max_length=384,
        truncation=True
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        start_logits = outputs.start_logits[0]
        end_logits = outputs.end_logits[0]
        
        # Calculate null score (score for predicting no answer)
        null_score = start_logits[0].item() + end_logits[0].item()
        
        # Find the best non-null answer, excluding [CLS] position
        # Set logits at [CLS] position to negative infinity
        start_logits[0] = float('-inf')
        end_logits[0] = float('-inf')
        
        start_idx = torch.argmax(start_logits)
        end_idx = torch.argmax(end_logits)
        
        # Ensure end_idx is not less than start_idx
        if end_idx < start_idx:
            end_idx = start_idx
            
        answer_score = start_logits[start_idx].item() + end_logits[end_idx].item()
        
        # If null score is higher (beyond threshold), return "no answer"
        if null_score - answer_score > threshold:
            return "Question cannot be answered based on the given context."
            
        # Otherwise, return the extracted answer
        tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
        answer = tokenizer.convert_tokens_to_string(tokens[start_idx:end_idx+1])
        
        # Check if answer is empty or contains only special tokens
        if not answer.strip() or answer.strip() in ['[CLS]', '[SEP]']:
            return "Question cannot be answered based on the given context."
            
        return answer.strip()

questions = [
    "本届第十五届珠海航展的亮点和主要展示内容是什么？",
    "珠海杀人案发生地点？"
]

context = '第十五届中国国际航空航天博览会（珠海航展）于2024年11月12日至17日在珠海国际航展中心举行。本届航展吸引了来自47个国家和地区的超过890家企业参展，展示了涵盖"陆、海、空、天、电、网"全领域的高精尖展品。其中，备受瞩目的中国空军"八一"飞行表演队和"红鹰"飞行表演队，以及俄罗斯"勇士"飞行表演队同台献技，为观众呈现了精彩的飞行表演。此外，本届航展还首次开辟了无人机、无人船演示区，展示了多款前沿科技产品。'

for question in questions:
    answer = get_answer(question, context)
    print(f"问题: {question}")
    print(f"答案: {answer}")
    print("-" * 50)
```

## Limitations and Bias

The model shows significant performance disparity between answerable and unanswerable questions, which might indicate:
1. Dataset quality issues
2. Potential translation artifacts in the Chinese version of SQuAD
3. Imbalanced handling of answerable vs. unanswerable questions

## Ethics & Responsible AI

Users should be aware that:
- The model may reflect biases present in the training data
- Performance varies significantly based on question type
- Results should be validated for critical applications
- The model should not be used as the sole decision-maker in critical systems