--- base_model: mrm8488/longformer-base-4096-finetuned-squadv2 tags: - generated_from_trainer license: apache-2.0 datasets: - Kkordik/NovelQSI language: - en model-index: - name: Kkordik/test_longformer_4096_qsi results: - task: type: question-answering dataset: type: Kkordik/NovelQSI name: NovelQSI split: test metrics: - type: exact_match value: 20.346 verified: false - type: f1 value: 26.58 verified: false --- # longformer_4096_qsi This model is a fine-tuned version of [mrm8488/longformer-base-4096-finetuned-squadv2](https://huggingface.co/mrm8488/longformer-base-4096-finetuned-squadv2) on a tiny [NovelQSI](https://huggingface.co/datasets/Kkordik/NovelQSI) dataset. It achieves the following results on the evaluation set: - Loss: 2.9598 ## Model description This model is a test model for my research project. The idea of the model is to understand which novel character said the requested quote. It achieves a bit better results on the ´test´ split of the NovelQSI dataset than base longformer-base-4096-finetuned-squadv2 model on the same dataset split. **Base model results:** ``` { "exact_match": { "confidence_interval": [8.754452551305853, 14.718614718614718], "score": 12.121212121212121, "standard_error": 1.8579217243778676 }, "f1": { "confidence_interval": [18.469101076147584, 28.28409063313956], "score": 22.799422799422796, "standard_error": 2.896728175757627 }, "latency_in_seconds": 0.7730605573419919, "samples_per_second": 1.2935597224598967, "total_time_in_seconds": 178.5769887460001 } ``` **Achieved results:** ``` { "exact_match": { "confidence_interval": [16.017316017316016, 24.242424242424242], "score": 20.346320346320347, "standard_error": 2.9434375492784994 }, "f1": { "confidence_interval": [23.123469058324783, 31.823648733317036], "score": 26.580086580086572, "standard_error": 2.593030474995015 }, "latency_in_seconds": 0.8093855569913422, "samples_per_second": 1.235505120349827, "total_time_in_seconds": 186.96806366500005 } ``` The results have shown, that the technique has its future. ## Training and evaluation data You can find training code in the github repo of my research: https://github.com/Kkordik/NovelQSI It was trained and evaluated in notebooks, so it is easy to reproduce. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | No log | 1.0 | 93 | 3.0886 | | No log | 1.99 | 186 | 3.3755 | | No log | 2.99 | 279 | 2.9598 | ### Framework versions - Transformers 4.35.2 - Pytorch 2.1.0+cu118 - Datasets 2.15.0 - Tokenizers 0.15.0