Editorial - (2022) Volume 13, Issue 1
Received: 10-Jan-2022, Manuscript No. JDMGP-22-15135; Editor assigned: 12-Jan-2022, Pre QC No. JDMGP-22-15135; Reviewed: 24-Jan-2022, QC No. JDMGP-22-15135; Revised: 27-Jan-2022, Manuscript No. JDMGP-22-15135; Published: 31-Jan-2022, DOI: 10.4172/2153-0602.22.13.e135
Cluster analysis or clustering, is the task of grouping a set of objects so that objects in the same group (called a cluster) are more similar to each other than objects in other groups (clusters). This is a major task of exploratory data analysis and is common in statistical data analysis used in many areas such as pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics, and machine learning. It’s a technique. Cluster analysis itself is not a specific algorithm, but it is a general problem to be solved. This can be achieved by a variety of algorithms that have significantly different understandings of what constitutes a cluster and how to find it efficiently. General concepts of clusters include groups with small distances between cluster members, dense areas of data space, spacing, or statistical distribution. Therefore, clustering can be formulated as a multi-objective optimization problem. Appropriate clustering algorithms and parameter settings, including parameters such as distance function to use, density threshold, and expected number of clusters, depend on the individual dataset and the intended use of the results. Cluster analysis itself is not an automated task, but an iterative process of trial-and-error knowledge discovery or multipurpose interactive optimization. It is often necessary to change the data pre-processing and model parameters until the results achieve the desired properties.
In addition to the term clustering, there are many terms with similar meanings such as automatic classification, numerical taxonomy, botanicals, typology, and community recognition. The subtle difference is often in the use of the results. Data mining is about groups of results, while automatic classification is about results selectivity. Cluster analysis was developed by anthropological drivers and clovers in 1932, introduced into psychology by Joseph Zubin in 1938, by Robert Tryon in 1939, and by Cattell from 1943. Used for classification. There is no exact definition for the term “cluster”. This is one of the reasons why there are so many clustering algorithms. There is one common denominator. It is a group of data objects. However, different researchers can use different cluster models and specify different algorithms for each of these cluster models. The concept of clusters discovered by different algorithms has significantly different properties.
In recent years, considerable efforts have been made to improve the performance of existing algorithms; among them are CLARANS and BIRCH. Nowadays, as we need to process larger and larger datasets (also known as big data) we are more motivated to trade the semantic meaning of the generated clusters for performance. This has led to the development of pre-clustering techniques such as canopy clustering that can efficiently process large datasets, but the resulting “cluster” is a loosely pre-divided dataset and k-means clustering. Just analyse the partition using existing slow techniques such as for high-dimensional data, many existing methods fail due to the dimensionality, and the problem is certain distance functions in high-dimensional space. This is a high- dimensional focus on subspace clustering only a few attributes are used and the cluster model contains cluster-related attributes and correlated clustering looking for arbitrary rotation (correlation). A new clustering algorithm for data was born. Subspace clusters can be modelled by correlating their attributes. Examples of such clustering algorithms are CLIQUE and SUBCLU.
Ideas from density-based clustering techniques specifically DBSCAN/OPTICS family algorithms include subspace clustering (HiSC, hierarchical subspace clustering and DiSH) and correlated clustering (HiCO, hierarchical correlated clustering, 4C using correlated connections and ERiC and exploring hierarchical density-based correlation clusters). Several different clustering systems based on mutual information have been proposed. One of them is a variant of Marina Meila’s information metric. The other provides hierarchical clustering. A variety of different fitting functions can be optimized using genetic algorithms that include mutual information. In addition, the recent development of computer science and statistical physics, the spread of beliefs, has created new types of clustering algorithms.
Citation: Shao L (2022) Recent Developments on Cluster Analysis. J Data Mining Genomics Proteomics. 13:e135.
Copyright: © 2022 Shao L. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.