Ontological Optimization for Latent Semantic Indexing of Arabic Corpus
Publication date: 2018
Source: Procedia Computer Science, Volume 142
Author(s): Aya M. Al-Zoghby, Khaled Shaalan
The dimensionality reduction is a critical problem in the information retrieval process. The higher dimensions directly affect the search performance in terms of Recall and Precision. The dimensionality reduction enabling the search to be semantically based instead of lexically based as the dimensions are defined in terms of the semantic concepts instead of traditional terms or keywords. Latent Semantic Indexing (LSI) is a mathematical extension of the classical Vector Space Model (VSM). LSI is used to discover the latent semantic in the search space by extracting concepts from the original terms in the space. LSI is based on the Singular Value Decomposition (SVD) to reduce the dimension of the term space into a lower dimensional LSI space. In this paper, we propose a methodology for extra optimal LSI dimension reduction via two reduction levels. The first reduction level is based on an ontological conceptualization process. The Universal Wordnet ontology (UWN) is used to develop an ontological based concept space instead of the term space. As a second reduction level, the SVD is applied to the extracted concept space for getting an optimal LSI conceptualization. The experimental results of this research indicate an improvement in the search results in terms of both Precision and Recall as the proposed methodology addresses the Synonymy and Polysemy problems effectively.