metadata
library_name: transformers
base_model: bert-base-chinese
tags:
- generated_from_trainer
datasets:
- real-jiakai/chinese-squadv2
model-index:
- name: chinese_squadv2
results: []
bert-base-chinese-finetuned-squadv2
This model is a fine-tuned version of bert-base-chinese on the Chinese SQuAD v2.0 dataset.
Model Description
This model is designed for Chinese question answering tasks, specifically for extractive QA where the answer must be extracted from a given context paragraph. It can handle both answerable and unanswerable questions, following the SQuAD v2.0 format.
Key features:
- Based on BERT-base Chinese architecture
- Supports both answerable and unanswerable questions
- Trained on Chinese question-answer pairs
- Optimized for extractive question answering
Intended Uses & Limitations
Intended Uses
- Chinese extractive question answering
- Reading comprehension tasks
- Information extraction from Chinese text
- Automated question answering systems
Limitations
- Performance is significantly better on unanswerable questions (76.65% accuracy) compared to answerable questions (36.41% accuracy)
- Limited to extractive QA (cannot generate new answers)
- May not perform well on domain-specific questions outside the training data
- Designed for modern Chinese text, may not work well with classical Chinese or dialectal variations
Training and Evaluation Data
The model was trained on the Chinese SQuAD v2.0 dataset, which contains:
Training Set:
- Total examples: 90,027
- Answerable questions: 46,529
- Unanswerable questions: 43,498
Validation Set:
- Total examples: 9,936
- Answerable questions: 3,991
- Unanswerable questions: 5,945
Training Procedure
Training Hyperparameters
- Learning rate: 3e-05
- Batch size: 12
- Evaluation batch size: 8
- Number of epochs: 5
- Optimizer: AdamW (尾1=0.9, 尾2=0.999, 蔚=1e-08)
- Learning rate scheduler: Linear
- Maximum sequence length: 384
- Document stride: 128
- Training device: CUDA-enabled GPU
Training Results
Final evaluation metrics:
- Overall Exact Match: 60.49%
- Overall F1 Score: 60.54%
- Answerable Questions:
- Exact Match: 36.41%
- F1 Score: 36.53%
- Unanswerable Questions:
- Exact Match: 76.65%
- F1 Score: 76.65%
Framework Versions
- Transformers: 4.47.0.dev0
- PyTorch: 2.5.1+cu124
- Datasets: 3.1.0
- Tokenizers: 0.20.3
Usage
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "real-jiakai/bert-base-chinese-finetuned-squadv2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
# Prepare the inputs
question = "your_question"
context = "your_context"
inputs = tokenizer(
question,
context,
add_special_tokens=True,
return_tensors="pt"
)
# Get the answer
start_scores, end_scores = model(**inputs)
start_index = torch.argmax(start_scores)
end_index = torch.argmax(end_scores)
answer = tokenizer.convert_tokens_to_string(
tokenizer.convert_ids_to_tokens(inputs["input_ids"][0][start_index:end_index+1])
)
Limitations and Bias
The model shows significant performance disparity between answerable and unanswerable questions, which might indicate:
- Dataset quality issues
- Potential translation artifacts in the Chinese version of SQuAD
- Imbalanced handling of answerable vs. unanswerable questions
Ethics & Responsible AI
Users should be aware that:
- The model may reflect biases present in the training data
- Performance varies significantly based on question type
- Results should be validated for critical applications
- The model should not be used as the sole decision-maker in critical systems
Framework versions
- Transformers 4.47.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3