Wednesday, February 26, 2020

Efficiency of Clustering Algorithms in Mining Biological databases Research Paper

Efficiency of Clustering Algorithms in Mining Biological databases - Research Paper Example For example Hierarchical algorithms often work by either splitting or merging the groups being analyzed in order to develop a hierarchy of clusters that is based on the similarity of the sequences. On the other hand, partitioning algorithms work by partitioning the data sets being analyzed based on distance between them (Fayyad , 2003, 346). The selection of any type of clustering algorithm should however be primarily based on the nature of the sequences or clusters to be analyzed, the acceptable error expected as the availability of computational resources. This is particularly with regard to the fact that each of the categories of clustering algorithms has its own strengths and limitations and therefore suited for different tasks. Biological databases such as those involving the mining of protein or gene sequences are best analyzed using clustering algorithms because it provides detailed exploratory analysis of the sequences. This paper critically analyzes the efficiency of cluster ing algorithms in the mining of biological databases such as gene sequences. Applications of clustering algorithms in analyzing gene sequences During the statistical analysis of biological databases, the choice of clustering, the choice of clustering algorithm often depends on the nature of data sets as well as the intended application of the results. In biological data mining most of the sequences that are increasingly being analyzed using clustering algorithms include genomic as well as protein sequences. According to Werner (2008, 52), recent advances in bioinformatics have resulted in the increasing use of clustering algorithms in the analysis of both protein and gene sequences In the study of gene expressions, clustering is one of the major exploratory techniques used in the analysis of microarray slides containing hundreds of thousands of genes2. In such cases, clustering is employed to help group together similar genes and consequently enable the biologists to identify the re lationship between the particular genes as well as reduce the amount of information that is needed to be analyzed. Genes clustered together are usually co-regulated or sharing similar functions. Additionally when time series clustering methods are used, genes which exhibit similar characteristics at given times may be grouped together to indicate a possibility of co-regulation. Clustering algorithms can also be efficiently used analyze gene samples on the basis of similar expression patterns. Although expression patterns usually involve complex phenotypes, clustering analysis is one of the most effective techniques that can be used to identify arrays with similar or different phenotype characteristics. This application of clustering algorithms is particularly important in medical researches where the approach allows medical scientists to identify different pathologies on the basis of the gene expression patterns as opposed to the common histological methods. In unsupervised cluster analysis of gene expression arrays, the major assumption is usually that genes of the same biological process should be clustered together whether in condition dependent or in time series. Another important area in the mining of genetic databases in which clustering algorithms is significantly used is the analysis of gene profiles. In this regard clustering algorithm is potentially important in the analysis of sub classes of diseases as well as in the detection of genes

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.