Papers
arxiv:2411.17863

LongKey: Keyphrase Extraction for Long Documents

Published on Nov 26
Ā· Submitted by jeohalves on Nov 29
Authors:
,

Abstract

In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase extraction addresses this challenge by identifying representative terms within texts. However, most existing methods focus on short documents (up to 512 tokens), leaving a gap in processing long-context documents. In this paper, we introduce LongKey, a novel framework for extracting keyphrases from lengthy documents, which uses an encoder-based language model to capture extended text intricacies. LongKey uses a max-pooling embedder to enhance keyphrase candidate representation. Validated on the comprehensive LDKP datasets and six diverse, unseen datasets, LongKey consistently outperforms existing unsupervised and language model-based keyphrase extraction methods. Our findings demonstrate LongKey's versatility and superior performance, marking an advancement in keyphrase extraction for varied text lengths and domains.

Community

Paper author Paper submitter
ā€¢
edited 11 days ago

Excited to share our preprint, LongKey: Keyphrase Extraction for Long Documents, is now on arXiv! šŸŽ‰

šŸ“¢ Accepted for IEEE BigData 2024!
šŸ’» Code: https://github.com/jeohalves/longkey

Screenshot From 2024-11-28 10-51-26.png

"LongKey is introduced, a novel framework for extracting keyphrases from lengthy documents, which uses an encoder-based language model to capture extended text intricacies and a max-pooling embedder to enhance keyphrase candidate representation."

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.17863 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.17863 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.17863 in a Space README.md to link it from this page.

Collections including this paper 5