Model Card: BERT Fine-tuned on SWAG
Model Overview
This model is a fine-tuned version of BERT-Base, Uncased developed by Google. The fine-tuning was performed on the SWAG dataset, a large-scale dataset for grounded commonsense inference, though the specific details of the dataset used were not provided. The model was fine-tuned for a single epoch and is optimized for tasks related to natural language understanding, particularly in scenarios requiring reasoning about the world using commonsense.
Model Architecture
- Base Model: BERT-Base, Uncased
- Layers: 12 Transformer layers
- Parameters: 110M
- Pre-training: The base model was pre-trained on the English Wikipedia and BookCorpus datasets.
Performance
The model achieved the following results on the evaluation set:
- Validation Loss: 0.5240
- Accuracy: 79.70%
Intended Use
Use Cases
This model is intended for tasks requiring natural language understanding, especially those involving commonsense reasoning. Potential use cases include:
- Multiple-choice question answering
- Contextual word embedding generation
- Commonsense inference tasks
Limitations
- Data Bias: As the dataset specifics are unknown, there might be biases in the training data that could affect the model’s predictions.
- Generalization: The model's performance on domains outside of commonsense reasoning tasks (like domain-specific text) may be suboptimal.
- Ethical Considerations: Users should be aware of potential ethical concerns when applying this model to sensitive or critical tasks. Misinterpretation of commonsense reasoning could lead to flawed or biased outcomes.
Training and Evaluation Data
Dataset
The model was fine-tuned on a dataset intended for grounded commonsense inference, likely the SWAG dataset. The specifics of the dataset, including size, distribution, and preprocessing methods, were not provided.
Training Procedure
Hyperparameters
The model was trained using the following hyperparameters:
- Learning Rate: 5e-05
- Train Batch Size: 16
- Eval Batch Size: 16
- Optimizer: Adam (betas: (0.9, 0.999), epsilon: 1e-08)
- Learning Rate Scheduler: Linear
- Number of Epochs: 1
- Seed: 42
Training Results
The training and evaluation results are summarized below:
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
0.6971 | 1.0 | 4597 | 0.5240 | 0.7970 |
Framework Versions
The following software versions were used during training:
- Transformers: 4.42.4
- PyTorch: 2.4.0+cu121
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Ethical Considerations
When deploying this model, users should be cautious of potential biases and limitations inherent in the dataset and the model’s training process. Ensuring that the model is used in a manner that is fair, unbiased, and ethical is crucial, particularly in sensitive applications.
Contact Information
For further information or questions, please contact the maintainers of this model or refer to the associated documentation and code repository.
- Downloads last month
- 12
Model tree for ashaduzzaman/bert-finetuned-swag
Base model
google-bert/bert-base-uncased