Toward Salient Key Phrase for Candidate Topic Detection
DOI:
https://doi.org/10.30572/2018/KJE/160213Keywords:
keyphrase extraction, NLP, information retrieval, topic keyphraseAbstract
With exponential growth of digital information, the need for efficient methods for automatic keyphrase extraction has become increasingly important. Key phrase candidate topic detection (KPCTD) aims to automatically identify key phrases, i.e., phrases that capture the central meaning of a text document and associate them with their corresponding topics. We have developed an innovative method that combines statistical with contextual approaches ( position and distance criteria in addition to semantic information). We present a comprehensive approach to text analysis; it enables the use of a harmonious mix of different features that allows for precise and effective extraction of relevant information. furthermore, for sifting the later extracted key phrases into condensed thematic (topic) key phrases written under (ABSTRACT) part, superiority of the various strategies is examined, such as approximate matching with key sentences at the beginning of the text, the identification of cluster foci, and the prioritization of frequent phrases. After extensive investigations on two datasets, semeval2017 and Inspec, the proposed PhraeRank approach outperforms the previous results. Quantitative metrics achieve a precision of 51.23% and a recall of 28.26% for top 5 keyphrases on the SemEval2017 dataset, and a precision of 47.89% and recall of 25.34% on the Inspec dataset. Additionally, value of a BLEU score is 0.62 on the SemEval 2017 dataset and 0.58 on the Inspec dataset. demonstrating significant improvement over existing methods. These results highlight the algorithm's ability to extract relevant information from text documents.
Downloads
References
Abasi, A., Khader, A.T. and Al-Betar, M.A. (2022) ‘AN IMPROVED MULTI-VERSE OPTIMIZER FOR TEXT DOCUMENTS CLUSTERING’, Kufa Journal of Engineering, 13(2), pp. 28–42. Available at: https://doi.org/10.30572/2018/KJE/130203. DOI: https://doi.org/10.30572/2018/KJE/130203
Augenstein, I. et al. (2017) ‘SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications’. Available at: http://arxiv.org/abs/1704.02853. DOI: https://doi.org/10.18653/v1/S17-2091
Boudin, F. (2018) ‘Unsupervised Keyphrase Extraction with Multipartite Graphs’. Available at: http://arxiv.org/abs/1803.08721. DOI: https://doi.org/10.18653/v1/N18-2105
Chen, X. et al. (2012) CIKM’12 : the proceedings of the 21st ACM International Conference on Information and Knowledge Management : October 29 - November 2, 2012, Maui, Hawaii, USA.
Devika, R. et al. (2021) ‘A Deep Learning Model Based on BERT and Sentence Transformer for Semantic Keyphrase Extraction on Big Social Data’, IEEE Access, 9, pp. 165252–165261. Available at: https://doi.org/10.1109/ACCESS.2021.3133651. DOI: https://doi.org/10.1109/ACCESS.2021.3133651
Devlin, J. et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Available at: https://github.com/tensorflow/tensor2tensor.
Ding, H. and Luo, X. (2022) AGRank: Augmented Graph-based Unsupervised Keyphrase Extraction. Long Papers. Available at: https://github.com/hd10-iupui/AGRank. DOI: https://doi.org/10.18653/v1/2022.aacl-main.19
Du, H. et al. (2023) ‘Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph model’, Journal of Big Data, 10(1). Available at: https://doi.org/10.1186/s40537-023-00833-1. DOI: https://doi.org/10.1186/s40537-023-00833-1
Florescu, C. and Caragea, C. (2017) ‘PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents’, in ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL), pp. 1105–1115. Available at: https://doi.org/10.18653/v1/P17-1102. DOI: https://doi.org/10.18653/v1/P17-1102
Han, J., Kim, T. and Choi, J. (2008) ‘Web Document Clustering by Using Automatic Keyphrase Extraction’, in. Institute of Electrical and Electronics Engineers (IEEE), pp. 56–59. Available at: https://doi.org/10.1109/wi-iatw.2007.46. DOI: https://doi.org/10.1109/WI-IATW.2007.46
Huang, A. (2018) Similarity Measures for Text Document Clustering.
Hulth, A. (2003) Improved Automatic Keyword Extraction Given More Linguistic Knowledge. DOI: https://doi.org/10.3115/1119355.1119383
Kong, A. et al. (2023) PromptRank: Unsupervised Keyphrase Extraction Using Prompt. Long Papers. DOI: https://doi.org/10.18653/v1/2023.acl-long.545
Liao, S. et al. (2023) ‘TopicLPRank: a keyphrase extraction method based on improved TopicRank’, Journal of Supercomputing, 79(8), pp. 9073–9092. Available at: https://doi.org/10.1007/s11227-022-05022-0. DOI: https://doi.org/10.1007/s11227-022-05022-0
Liu, R., Lin, Z. and Wang, W. (2021) ‘Addressing Extraction and Generation Separately: Keyphrase Prediction with Pre-Trained Language Models’, IEEE/ACM Transactions on Audio Speech and Language Processing, 29, pp. 3180–3191. Available at: https://doi.org/10.1109/TASLP.2021.3120587. DOI: https://doi.org/10.1109/TASLP.2021.3120587
Mihalcea, R. and Tarau, P. (2004) TextRank: Bringing Order into Texts.
Papagiannopoulou, E. and Tsoumakas, G. (2019) ‘A Review of Keyphrase Extraction’. Available at: http://arxiv.org/abs/1905.05044. DOI: https://doi.org/10.1002/widm.1339
Papineni, K. et al. (2002) BLEU: a Method for Automatic Evaluation of Machine Translation. DOI: https://doi.org/10.3115/1073083.1073135
Patel, K. and Caragea, C. (2021) Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers. DOI: https://doi.org/10.18653/v1/2021.eacl-main.136
Sarwar, T. Bin, Noor, N.M. and Miah, M.S.U. (2022) ‘Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding’, PeerJ Computer Science, 8. Available at: https://doi.org/10.7717/peerj-cs.1024. DOI: https://doi.org/10.7717/peerj-cs.1024
Song, M., Feng, Y. and Jing, L. (2022) Utilizing BERT Intermediate Layers for Unsupervised Keyphrase Extraction.
Song, M., Feng, Y. and Jing, L. (2023) A Survey on Recent Advances in Keyphrase Extraction from Pre-trained Language Models. Available at: https://huggingface.co/bert-base-cased. DOI: https://doi.org/10.18653/v1/2023.findings-eacl.161
Tsvetkov, A. and Kipnis, A. (2023) EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression.
Wu, Y.-F.B. et al. (2005) Domain-specific Keyphrase Extraction. DOI: https://doi.org/10.1145/1099554.1099628
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2025 Yasser S. Jude, Wafaa Al-Hameed

This work is licensed under a Creative Commons Attribution 4.0 International License.












