|
--- |
|
license: mit |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: multiCorp_5e-05_0404 |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# multiCorp_5e-05_0404 |
|
|
|
This model is a fine-tuned version of [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) on the None dataset. |
|
It achieves the following results on the evaluation set: |
|
- eval_loss: 0.0657 |
|
- eval_precision: 0.6398 |
|
- eval_recall: 0.6267 |
|
- eval_f1: 0.6332 |
|
- eval_accuracy: 0.9847 |
|
- eval_runtime: 39.7302 |
|
- eval_samples_per_second: 32.544 |
|
- eval_steps_per_second: 2.039 |
|
- epoch: 3.41 |
|
- step: 1100 |
|
|
|
|
|
Multi Corp Training, |
|
|
|
model = AutoModelForTokenClassification.from_pretrained( |
|
"microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext", num_labels=41, id2label=id2label, label2id=label2id |
|
) |
|
|
|
training_args = TrainingArguments( |
|
report_to = 'wandb', # enable logging to W&B |
|
output_dir = runname, # output directory/ name for huggingface hub |
|
learning_rate=5e-5, |
|
per_device_train_batch_size=16, |
|
per_device_eval_batch_size=16, |
|
weight_decay=0.01, |
|
evaluation_strategy = 'steps', # check evaluation metrics at each epoch |
|
max_steps = 2000, |
|
logging_steps = 25, # we will log every 25 steps |
|
eval_steps = 25, # we will perform evaluation every 25 steps |
|
save_steps = 25, |
|
load_best_model_at_end=True, |
|
metric_for_best_model = 'eval_loss', |
|
greater_is_better = False, |
|
push_to_hub=True, |
|
run_name = runname # name of the W&B run |
|
) |
|
|
|
trainer = Trainer( |
|
model=model, |
|
args=training_args, |
|
train_dataset=tokenized_data["train"], |
|
eval_dataset=tokenized_data["validation"], |
|
tokenizer=tokenizer, |
|
data_collator=data_collator, |
|
compute_metrics=compute_metrics, |
|
callbacks = [EarlyStoppingCallback(early_stopping_patience=6)] |
|
) |
|
|
|
[1101/2000 1:00:33 < 49:32, 0.30 it/s, Epoch 3.41/7] |
|
|
|
25 0.836100 0.201612 0.000000 0.000000 0.000000 0.973546 |
|
50 0.149500 0.154239 0.233246 0.124420 0.162277 0.972420 |
|
75 0.136300 0.138105 0.145299 0.094708 0.114671 0.972385 |
|
100 0.129900 0.123477 0.425243 0.203343 0.275126 0.975886 |
|
125 0.103100 0.118570 0.297553 0.321727 0.309168 0.974136 |
|
150 0.117300 0.113230 0.393373 0.214949 0.277995 0.977039 |
|
175 0.117500 0.106183 0.320082 0.291551 0.305151 0.975930 |
|
200 0.093800 0.102443 0.353604 0.291551 0.319593 0.975297 |
|
225 0.091900 0.105976 0.446684 0.318942 0.372156 0.977127 |
|
250 0.088700 0.093393 0.439173 0.335190 0.380200 0.977734 |
|
275 0.113300 0.097715 0.522222 0.218199 0.307793 0.977637 |
|
300 0.092900 0.085730 0.473552 0.349118 0.401924 0.979405 |
|
325 0.085700 0.091731 0.380009 0.409471 0.394190 0.976960 |
|
350 0.081700 0.086656 0.554161 0.389508 0.457470 0.980162 |
|
375 0.062400 0.083441 0.538000 0.374652 0.441708 0.980769 |
|
400 0.077500 0.085072 0.486742 0.477252 0.481950 0.978869 |
|
425 0.073000 0.078521 0.516658 0.467967 0.491108 0.981103 |
|
450 0.081000 0.077073 0.552381 0.430826 0.484090 0.981288 |
|
475 0.075100 0.078478 0.483887 0.446147 0.464251 0.980408 |
|
500 0.062800 0.073298 0.550633 0.484680 0.515556 0.982247 |
|
525 0.060600 0.069571 0.542723 0.536676 0.539683 0.982608 |
|
550 0.063900 0.071559 0.539832 0.506500 0.522635 0.981983 |
|
575 0.060700 0.068333 0.564646 0.519034 0.540881 0.982546 |
|
600 0.062900 0.072810 0.602013 0.416435 0.492316 0.981886 |
|
625 0.051300 0.071469 0.550901 0.525070 0.537675 0.982335 |
|
650 0.059500 0.067657 0.553466 0.478180 0.513076 0.982528 |
|
675 0.047500 0.067443 0.594739 0.566852 0.580461 0.983663 |
|
700 0.052100 0.065269 0.564447 0.546890 0.555529 0.983039 |
|
725 0.041500 0.067790 0.593516 0.552461 0.572253 0.983672 |
|
750 0.046300 0.067922 0.609038 0.538069 0.571358 0.983461 |
|
775 0.054300 0.064636 0.646725 0.582173 0.612753 0.984499 |
|
800 0.049500 0.067722 0.650905 0.517642 0.576674 0.983830 |
|
825 0.043100 0.069327 0.630043 0.471216 0.539177 0.982880 |
|
850 0.048000 0.063814 0.631025 0.528784 0.575398 0.984068 |
|
875 0.042500 0.064527 0.644913 0.582637 0.612195 0.984543 |
|
900 0.043500 0.065475 0.608295 0.490251 0.542931 0.983522 |
|
925 0.039200 0.066043 0.635938 0.566852 0.599411 0.984323 |
|
950 0.046800 0.062491 0.646930 0.547818 0.593263 0.984719 |
|
975 0.043700 0.061204 0.634625 0.585422 0.609032 0.984543 |
|
1000 0.032000 0.066377 0.643390 0.560353 0.599007 0.984349 |
|
1025 0.038100 0.064764 0.666482 0.559424 0.608279 0.984745 |
|
1050 0.035300 0.065642 0.635359 0.587279 0.610374 0.984464 |
|
1075 0.032800 0.064835 0.657262 0.584030 0.618486 0.984587 |
|
1100 0.031700 0.065726 0.639810 0.626741 0.633208 0.984710 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 16 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- training_steps: 2000 |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.27.4 |
|
- Pytorch 2.0.0+cu118 |
|
- Datasets 2.11.0 |
|
- Tokenizers 0.13.2 |
|
|