Toward Salient Key Phrase for Candidate Topic Detection

Yasser S. Jude; Wafaa Al-Hameed

doi:10.30572/2018/KJE/160213

Authors

Yasser S. Jude Software Department, College of Information Technology, University of Babylon, Babylon, Iraq https://orcid.org/0009-0006-7807-1548
Wafaa Al-Hameed Software Department, College of Information Technology, University of Babylon, Babylon, Iraq

DOI:

https://doi.org/10.30572/2018/KJE/160213

Keywords:

keyphrase extraction, NLP, information retrieval, topic keyphrase

Abstract

With exponential growth of digital information, the need for efficient methods for automatic keyphrase extraction has become increasingly important. Key phrase candidate topic detection (KPCTD) aims to automatically identify key phrases, i.e., phrases that capture the central meaning of a text document and associate them with their corresponding topics. We have developed an innovative method that combines statistical with contextual approaches ( position and distance criteria in addition to semantic information). We present a comprehensive approach to text analysis; it enables the use of a harmonious mix of different features that allows for precise and effective extraction of relevant information. furthermore, for sifting the later extracted key phrases into condensed thematic (topic) key phrases written under (ABSTRACT) part, superiority of the various strategies is examined, such as approximate matching with key sentences at the beginning of the text, the identification of cluster foci, and the prioritization of frequent phrases. After extensive investigations on two datasets, semeval2017 and Inspec, the proposed PhraeRank approach outperforms the previous results. Quantitative metrics achieve a precision of 51.23% and a recall of 28.26% for top 5 keyphrases on the SemEval2017 dataset, and a precision of 47.89% and recall of 25.34% on the Inspec dataset. Additionally, value of a BLEU score is 0.62 on the SemEval 2017 dataset and 0.58 on the Inspec dataset. demonstrating significant improvement over existing methods. These results highlight the algorithm's ability to extract relevant information from text documents.

Downloads

Download data is not yet available.

References

Abasi, A., Khader, A.T. and Al-Betar, M.A. (2022) ‘AN IMPROVED MULTI-VERSE OPTIMIZER FOR TEXT DOCUMENTS CLUSTERING’, Kufa Journal of Engineering, 13(2), pp. 28–42. Available at: https://doi.org/10.30572/2018/KJE/130203. DOI: https://doi.org/10.30572/2018/KJE/130203

Augenstein, I. et al. (2017) ‘SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications’. Available at: http://arxiv.org/abs/1704.02853. DOI: https://doi.org/10.18653/v1/S17-2091

Boudin, F. (2018) ‘Unsupervised Keyphrase Extraction with Multipartite Graphs’. Available at: http://arxiv.org/abs/1803.08721. DOI: https://doi.org/10.18653/v1/N18-2105

Chen, X. et al. (2012) CIKM’12 : the proceedings of the 21st ACM International Conference on Information and Knowledge Management : October 29 - November 2, 2012, Maui, Hawaii, USA.

Devika, R. et al. (2021) ‘A Deep Learning Model Based on BERT and Sentence Transformer for Semantic Keyphrase Extraction on Big Social Data’, IEEE Access, 9, pp. 165252–165261. Available at: https://doi.org/10.1109/ACCESS.2021.3133651. DOI: https://doi.org/10.1109/ACCESS.2021.3133651

Devlin, J. et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Available at: https://github.com/tensorflow/tensor2tensor.

Ding, H. and Luo, X. (2022) AGRank: Augmented Graph-based Unsupervised Keyphrase Extraction. Long Papers. Available at: https://github.com/hd10-iupui/AGRank. DOI: https://doi.org/10.18653/v1/2022.aacl-main.19

Du, H. et al. (2023) ‘Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph model’, Journal of Big Data, 10(1). Available at: https://doi.org/10.1186/s40537-023-00833-1. DOI: https://doi.org/10.1186/s40537-023-00833-1

Florescu, C. and Caragea, C. (2017) ‘PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents’, in ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL), pp. 1105–1115. Available at: https://doi.org/10.18653/v1/P17-1102. DOI: https://doi.org/10.18653/v1/P17-1102

Han, J., Kim, T. and Choi, J. (2008) ‘Web Document Clustering by Using Automatic Keyphrase Extraction’, in. Institute of Electrical and Electronics Engineers (IEEE), pp. 56–59. Available at: https://doi.org/10.1109/wi-iatw.2007.46. DOI: https://doi.org/10.1109/WI-IATW.2007.46

Huang, A. (2018) Similarity Measures for Text Document Clustering.

Hulth, A. (2003) Improved Automatic Keyword Extraction Given More Linguistic Knowledge. DOI: https://doi.org/10.3115/1119355.1119383

Kong, A. et al. (2023) PromptRank: Unsupervised Keyphrase Extraction Using Prompt. Long Papers. DOI: https://doi.org/10.18653/v1/2023.acl-long.545

Liao, S. et al. (2023) ‘TopicLPRank: a keyphrase extraction method based on improved TopicRank’, Journal of Supercomputing, 79(8), pp. 9073–9092. Available at: https://doi.org/10.1007/s11227-022-05022-0. DOI: https://doi.org/10.1007/s11227-022-05022-0

Liu, R., Lin, Z. and Wang, W. (2021) ‘Addressing Extraction and Generation Separately: Keyphrase Prediction with Pre-Trained Language Models’, IEEE/ACM Transactions on Audio Speech and Language Processing, 29, pp. 3180–3191. Available at: https://doi.org/10.1109/TASLP.2021.3120587. DOI: https://doi.org/10.1109/TASLP.2021.3120587

Mihalcea, R. and Tarau, P. (2004) TextRank: Bringing Order into Texts.

Papagiannopoulou, E. and Tsoumakas, G. (2019) ‘A Review of Keyphrase Extraction’. Available at: http://arxiv.org/abs/1905.05044. DOI: https://doi.org/10.1002/widm.1339

Papineni, K. et al. (2002) BLEU: a Method for Automatic Evaluation of Machine Translation. DOI: https://doi.org/10.3115/1073083.1073135

Patel, K. and Caragea, C. (2021) Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers. DOI: https://doi.org/10.18653/v1/2021.eacl-main.136

Sarwar, T. Bin, Noor, N.M. and Miah, M.S.U. (2022) ‘Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding’, PeerJ Computer Science, 8. Available at: https://doi.org/10.7717/peerj-cs.1024. DOI: https://doi.org/10.7717/peerj-cs.1024

Song, M., Feng, Y. and Jing, L. (2022) Utilizing BERT Intermediate Layers for Unsupervised Keyphrase Extraction.

Song, M., Feng, Y. and Jing, L. (2023) A Survey on Recent Advances in Keyphrase Extraction from Pre-trained Language Models. Available at: https://huggingface.co/bert-base-cased. DOI: https://doi.org/10.18653/v1/2023.findings-eacl.161

Tsvetkov, A. and Kipnis, A. (2023) EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression.

Wu, Y.-F.B. et al. (2005) Domain-specific Keyphrase Extraction. DOI: https://doi.org/10.1145/1099554.1099628