Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding
Publication date: 2018
Source: Procedia Computer Science, Volume 142
Author(s): Ali Alkhatlan, Jugal Kalita, Ahmed Alhaddad
Word Sense Disambiguation (WSD) is a task which aims to identify the meaning of a word given its context. This problem has been investigated and analyzed in depth in English. However, work in Arabic has been limited despite the fact that there are half a billion native Arabic speakers. In this work, we present multiple approaches for the problem of WSD in Arabic utilizing recent developments and successes in learning word embeddings with approaches such as GloVe, and Word2vec. The primary shortcoming of word embeddings is the single vector representation of a word’s meaning, although many words are polysemous. Our main contribution in this work is to computationally obtain an embedding for each sense, using an Arabic WordNet (AWN) to overcome the problem of WSD. We also compute word semantic similarity giving thought to multiple Arabic stemming algorithms. Finally, we make available a large pre-processed corpus that is ready to be used for further experiments and a WSD test data based on AWN,1 seeking to fill gaps in Arabic NLP (ANLP) compared to English.