--- library_name: transformers base_model: bert-base-chinese tags: - generated_from_trainer datasets: - real-jiakai/chinese-squadv2 model-index: - name: chinese_squadv2 results: [] --- # bert-base-chinese-finetuned-squadv2 This model is a fine-tuned version of [bert-base-chinese](https://huggingface.co/bert-base-chinese) on the [Chinese SQuAD v2.0 dataset](https://huggingface.co/datasets/real-jiakai/chinese-squadv2). ## Model Description This model is designed for Chinese question answering tasks, specifically for extractive QA where the answer must be extracted from a given context paragraph. It can handle both answerable and unanswerable questions, following the SQuAD v2.0 format. Key features: - Based on BERT-base Chinese architecture - Supports both answerable and unanswerable questions - Trained on Chinese question-answer pairs - Optimized for extractive question answering ## Intended Uses & Limitations ### Intended Uses - Chinese extractive question answering - Reading comprehension tasks - Information extraction from Chinese text - Automated question answering systems ### Limitations - Performance is significantly better on unanswerable questions (76.65% accuracy) compared to answerable questions (36.41% accuracy) - Limited to extractive QA (cannot generate new answers) - May not perform well on domain-specific questions outside the training data - Designed for modern Chinese text, may not work well with classical Chinese or dialectal variations ## Training and Evaluation Data The model was trained on the Chinese SQuAD v2.0 dataset, which contains: Training Set: - Total examples: 90,027 - Answerable questions: 46,529 - Unanswerable questions: 43,498 Validation Set: - Total examples: 9,936 - Answerable questions: 3,991 - Unanswerable questions: 5,945 ## Training Procedure ### Training Hyperparameters - Learning rate: 3e-05 - Batch size: 12 - Evaluation batch size: 8 - Number of epochs: 5 - Optimizer: AdamW (β1=0.9, β2=0.999, ε=1e-08) - Learning rate scheduler: Linear - Maximum sequence length: 384 - Document stride: 128 - Training device: CUDA-enabled GPU ### Training Results Final evaluation metrics: - Overall Exact Match: 60.49% - Overall F1 Score: 60.54% - Answerable Questions: - Exact Match: 36.41% - F1 Score: 36.53% - Unanswerable Questions: - Exact Match: 76.65% - F1 Score: 76.65% ### Framework Versions - Transformers: 4.47.0.dev0 - PyTorch: 2.5.1+cu124 - Datasets: 3.1.0 - Tokenizers: 0.20.3 ## Usage ```python from transformers import AutoModelForQuestionAnswering, AutoTokenizer import torch # Load model and tokenizer model_name = "real-jiakai/bert-base-chinese-finetuned-squadv2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForQuestionAnswering.from_pretrained(model_name) def get_answer(question, context, threshold=0.0): # Tokenize input with maximum sequence length of 384 inputs = tokenizer( question, context, return_tensors="pt", max_length=384, truncation=True ) with torch.no_grad(): outputs = model(**inputs) start_logits = outputs.start_logits[0] end_logits = outputs.end_logits[0] # Calculate null score (score for predicting no answer) null_score = start_logits[0].item() + end_logits[0].item() # Find the best non-null answer, excluding [CLS] position # Set logits at [CLS] position to negative infinity start_logits[0] = float('-inf') end_logits[0] = float('-inf') start_idx = torch.argmax(start_logits) end_idx = torch.argmax(end_logits) # Ensure end_idx is not less than start_idx if end_idx < start_idx: end_idx = start_idx answer_score = start_logits[start_idx].item() + end_logits[end_idx].item() # If null score is higher (beyond threshold), return "no answer" if null_score - answer_score > threshold: return "Question cannot be answered based on the given context." # Otherwise, return the extracted answer tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) answer = tokenizer.convert_tokens_to_string(tokens[start_idx:end_idx+1]) # Check if answer is empty or contains only special tokens if not answer.strip() or answer.strip() in ['[CLS]', '[SEP]']: return "Question cannot be answered based on the given context." return answer.strip() questions = [ "本届第十五届珠海航展的亮点和主要展示内容是什么?", "珠海杀人案发生地点?" ] context = '第十五届中国国际航空航天博览会(珠海航展)于2024年11月12日至17日在珠海国际航展中心举行。本届航展吸引了来自47个国家和地区的超过890家企业参展,展示了涵盖"陆、海、空、天、电、网"全领域的高精尖展品。其中,备受瞩目的中国空军"八一"飞行表演队和"红鹰"飞行表演队,以及俄罗斯"勇士"飞行表演队同台献技,为观众呈现了精彩的飞行表演。此外,本届航展还首次开辟了无人机、无人船演示区,展示了多款前沿科技产品。' for question in questions: answer = get_answer(question, context) print(f"问题: {question}") print(f"答案: {answer}") print("-" * 50) ``` ## Limitations and Bias The model shows significant performance disparity between answerable and unanswerable questions, which might indicate: 1. Dataset quality issues 2. Potential translation artifacts in the Chinese version of SQuAD 3. Imbalanced handling of answerable vs. unanswerable questions ## Ethics & Responsible AI Users should be aware that: - The model may reflect biases present in the training data - Performance varies significantly based on question type - Results should be validated for critical applications - The model should not be used as the sole decision-maker in critical systems