Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets
Publication date: January 2019
Source: Engineering Applications of Artificial Intelligence, Volume 77
Author(s): Amit K. Shukla, Pranab K. Muhuri
The major bottleneck in microarray gene expression analysis is the lack of techniques required to cope up with the uncertain gene functionality and inherent complex gene interactions. More often the issues of unstructured, unorganized, noisy and incomplete data is confronted with the advent of uncertain gene expression big datasets. Moreover, big data is naturally associated with the uncertainties which are originated from multiple sources. In this paper, such uncertainties in the gene expression dataset are modeled using interval type-2 fuzzy sets (IT2 FSs). For this, the spread of footprint of uncertainty (FOU), which accounts for the all the possible noises in the gene expression dataset, is modeled both symmetrically and asymmetrically. The medical science big dataset of microarray gene expression data of the cancer patients are considered for the experimentations. Empirically, the effects of uncertainty modeling using IT2 FSs on the big data clustering is analyzed and observed. Fuzzy clustering approach allow the genes to belong to multiple clusters and thus allow the genes participation in cellular process, subcell variations, and cellular metabolism. The effect of the induced uncertainty on the big data clustering has been studied using various cluster validity measures. The clustering results, obtained based on the IT2 fuzzy uncertainty modeling, are compared with same obtained with type-1 fuzzy sets based uncertainty modeling. It is demonstrated that our proposed IT2 FS based approach is more efficient in giving better clustering results for uncertain gene expression datasets and is scalable to the large gene expression datasets. Sensitivity analysis of the clustering results considering four different IT2 membership function shapes such as triangular, trapezoidal, semi-elliptic, and Gaussian are performed.