--- datasets: - midas/krapivin - midas/inspec - midas/kptimes - midas/duc2001 language: - en widget: - text: "Relevance has traditionally been linked with feature subset selection, but formalization of this link has not been attempted. In this paper, we propose two axioms for feature subset selection sufficiency axiom and necessity axiombased on which this link is formalized: The expected feature subset is the one which maximizes relevance. Finding the expected feature subset turns out to be NP-hard. We then devise a heuristic algorithm to find the expected subset which has a polynomial time complexity. The experimental results show that the algorithm finds good enough subset of features which, when presented to C4.5, results in better prediction accuracy." - text: "In this paper, we investigate cross-domain limitations of keyphrase generation using the models for abstractive text summarization. We present an evaluation of BART fine-tuned for keyphrase generation across three types of texts, namely scientific texts from computer science and biomedical domains and news texts. We explore the role of transfer learning between different domains to improve the model performance on small text corpora." --- # BART fine-tuned for keyphrase generation This is the bart-base (Lewis et al.. 2019) model finetuned for the keyphrase generation task (Glazkova & Morozov, 2023) on the fragments of the following corpora: * Krapivin (Krapivin et al., 2009) * Inspec (Hulth, 2003) * KPTimes (Gallina, 2019) * DUC-2001 (Wan, 2008) * PubMed (Schutz, 2008) * NamedKeys (Gero & Ho, 2019). ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("beogradjanka/bart_finetuned_keyphrase_extraction") model = AutoModelForSeq2SeqLM.from_pretrained("beogradjanka/bart_finetuned_keyphrase_extraction") text = "In this paper, we investigate cross-domain limitations of keyphrase generation using the models for abstractive text summarization.\ We present an evaluation of BART fine-tuned for keyphrase generation across three types of texts, \ namely scientific texts from computer science and biomedical domains and news texts. \ We explore the role of transfer learning between different domains to improve the model performance on small text corpora." tokenized_text = tokenizer.prepare_seq2seq_batch([text], return_tensors='pt') translation = model.generate(**tokenized_text) translated_text = tokenizer.batch_decode(translation, skip_special_tokens=True)[0] print(translated_text) ``` #### Training Hyperparameters The following hyperparameters were used during training: * learning_rate: 4e-5 * train_batch_size: 8 * optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08 * num_epochs: 6 **BibTeX:** ``` @article{glazkova2023cross, title={Cross-Domain Robustness of Transformer-based Keyphrase Generation}, author={Glazkova, Anna and Morozov, Dmitry}, journal={arXiv preprint arXiv:2312.10700}, year={2023} } ```