Recent Developments on Cluster Analysis

Luo Shao

doi:10.4172/2153-0602.22.13.e135

Awards Nomination 20+ Million Readerbase

Google Scholar citation report

Citations : 1498

Journal of Data Mining in Genomics & Proteomics received 1498 citations as per Google Scholar report

Journal of Data Mining in Genomics & Proteomics peer review process verified at publons

25+ Million Website Visitors

Indexed In

Academic Journals Database
Open J Gate
Genamics JournalSeek
JournalTOCs
ResearchBible
Ulrich's Periodicals Directory
Electronic Journals Library
RefSeek
Hamdard University
EBSCO A-Z
OCLC- WorldCat
Scholarsteer
SWB online catalog
Virtual Library of Biology (vifabio)
Publons
MIAR
Geneva Foundation for Medical Education and Research
Euro Pub
Google Scholar

Useful Links

Share This Page

Journal Flyer

Open Access Journals

Editorial - (2022) Volume 13, Issue 1

Recent Developments on Cluster Analysis

Luo Shao^*

Department of Biological Science, Fujian Medical University, Fuzhou, China

^*Correspondence: Luo Shao, Department of Biological Science, Fujian Medical University, Fuzhou, China, Email:

Received: 10-Jan-2022, Manuscript No. JDMGP-22-15135; Editor assigned: 12-Jan-2022, Pre QC No. JDMGP-22-15135; Reviewed: 24-Jan-2022, QC No. JDMGP-22-15135; Revised: 27-Jan-2022, Manuscript No. JDMGP-22-15135; Published: 31-Jan-2022, DOI: 10.4172/2153-0602.22.13.e135

Description

Cluster analysis or clustering, is the task of grouping a set of objects so that objects in the same group (called a cluster) are more similar to each other than objects in other groups (clusters). This is a major task of exploratory data analysis and is common in statistical data analysis used in many areas such as pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics, and machine learning. It’s a technique. Cluster analysis itself is not a specific algorithm, but it is a general problem to be solved. This can be achieved by a variety of algorithms that have significantly different understandings of what constitutes a cluster and how to find it efficiently. General concepts of clusters include groups with small distances between cluster members, dense areas of data space, spacing, or statistical distribution. Therefore, clustering can be formulated as a multi-objective optimization problem. Appropriate clustering algorithms and parameter settings, including parameters such as distance function to use, density threshold, and expected number of clusters, depend on the individual dataset and the intended use of the results. Cluster analysis itself is not an automated task, but an iterative process of trial-and-error knowledge discovery or multipurpose interactive optimization. It is often necessary to change the data pre-processing and model parameters until the results achieve the desired properties.

In addition to the term clustering, there are many terms with similar meanings such as automatic classification, numerical taxonomy, botanicals, typology, and community recognition. The subtle difference is often in the use of the results. Data mining is about groups of results, while automatic classification is about results selectivity. Cluster analysis was developed by anthropological drivers and clovers in 1932, introduced into psychology by Joseph Zubin in 1938, by Robert Tryon in 1939, and by Cattell from 1943. Used for classification. There is no exact definition for the term “cluster”. This is one of the reasons why there are so many clustering algorithms. There is one common denominator. It is a group of data objects. However, different researchers can use different cluster models and specify different algorithms for each of these cluster models. The concept of clusters discovered by different algorithms has significantly different properties.

In recent years, considerable efforts have been made to improve the performance of existing algorithms; among them are CLARANS and BIRCH. Nowadays, as we need to process larger and larger datasets (also known as big data) we are more motivated to trade the semantic meaning of the generated clusters for performance. This has led to the development of pre-clustering techniques such as canopy clustering that can efficiently process large datasets, but the resulting “cluster” is a loosely pre-divided dataset and k-means clustering. Just analyse the partition using existing slow techniques such as for high-dimensional data, many existing methods fail due to the dimensionality, and the problem is certain distance functions in high-dimensional space. This is a high- dimensional focus on subspace clustering only a few attributes are used and the cluster model contains cluster-related attributes and correlated clustering looking for arbitrary rotation (correlation). A new clustering algorithm for data was born. Subspace clusters can be modelled by correlating their attributes. Examples of such clustering algorithms are CLIQUE and SUBCLU.

Ideas from density-based clustering techniques specifically DBSCAN/OPTICS family algorithms include subspace clustering (HiSC, hierarchical subspace clustering and DiSH) and correlated clustering (HiCO, hierarchical correlated clustering, 4C using correlated connections and ERiC and exploring hierarchical density-based correlation clusters). Several different clustering systems based on mutual information have been proposed. One of them is a variant of Marina Meila’s information metric. The other provides hierarchical clustering. A variety of different fitting functions can be optimized using genetic algorithms that include mutual information. In addition, the recent development of computer science and statistical physics, the spread of beliefs, has created new types of clustering algorithms.

Citation: Shao L (2022) Recent Developments on Cluster Analysis. J Data Mining Genomics Proteomics. 13:e135.

Copyright: © 2022 Shao L. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Journal of Data Mining in Genomics & Proteomics

PMC/PubMed Indexed Articles

Google Scholar citation report

Citations : 1498

Journal of Data Mining in Genomics & Proteomics peer review process verified at publons

25+ Million Website Visitors

Indexed In

Useful Links

Share This Page

Journal Flyer

Open Access Journals

Recent Developments on Cluster Analysis

Description