Category: Cluster analysis algorithms

CURE algorithm

CURE (Clustering Using REpresentatives) is an efficient data clustering algorithm for large databases. Compared with K-means clustering it is more robust to outliers and able to identify clusters havi

Expectation–maximization algorithm

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where t

Spectral clustering

In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer di

Automatic clustering algorithms

Automatic clustering algorithms are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other cluster analysis techniques, automatic clustering algorithms can

Chinese whispers (clustering method)

Chinese whispers is a clustering method used in network science named after the famous whispering game. Clustering methods are basically used to identify communities of nodes or links in a given netwo

Affinity propagation

In statistics and data mining, affinity propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points.Unlike clustering algorithms such as k-means or k-medoi

Data stream clustering

In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering

Hoshen–Kopelman algorithm

The Hoshen–Kopelman algorithm is a simple and efficient algorithm for labeling clusters on a grid, where the grid is a regular network of cells, with the cells being either occupied or unoccupied. Thi

Hierarchical clustering

In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hi

BFR algorithm

The BFR algorithm, named after its inventors Bradley, Fayyad and Reina, is a variant of k-means algorithm that is designed to cluster data in a high-dimensional Euclidean space. It makes a very strong

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with th

K-medoids

The k-medoids problem is a clustering problem similar to k-means. The name was coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM algorithm. Both the k-means and k-medoids algorithms are

BIRCH

BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets. With modificati

Information bottleneck method

The information bottleneck method is a technique in information theory introduced by Naftali Tishby, Fernando C. Pereira, and William Bialek. It is designed for finding the best tradeoff between accur

Single-linkage clustering

In statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining

Jenks natural breaks optimization

The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes. This

DBSCAN

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.It is a density-bas

Quantum clustering

Quantum Clustering (QC) is a class of data-clustering algorithms that use conceptual and mathematical tools from quantum mechanics. QC belongs to the family of density-based clustering algorithms, whe

OPTICS algorithm

Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Krie

Pitman–Yor process

In probability theory, a Pitman–Yor process denoted PY(d, θ, G0), is a stochastic process whose sample path is a probability distribution. A random sample from this process is an infinite discrete pro

Self-organizing map

A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher

Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a colle

Mean shift

Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster anal

Constrained clustering

In computer science, constrained clustering is a class of semi-supervised learning algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constra

Complete-linkage clustering

Complete-linkage clustering is one of several methods of agglomerative hierarchical clustering. At the beginning of the process, each element is in a cluster of its own. The clusters are then sequenti

Canopy clustering algorithm

The canopy clustering algorithm is an unsupervised pre-clustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000. It is often used as preprocessing step for the K-means al

Cobweb (clustering)

COBWEB is an incremental system for hierarchical conceptual clustering. COBWEB was invented by Professor , currently at Vanderbilt University. COBWEB incrementally organizes observations into a classi

Linde–Buzo–Gray algorithm

The Linde–Buzo–Gray algorithm (introduced by Yoseph Linde, Andrés Buzo and Robert M. Gray in 1980) is a vector quantization algorithm to derive a good codebook. It is similar to the k-means method in

WPGMA

WPGMA (Weighted Pair Group Method with Arithmetic Mean) is a simple agglomerative (bottom-up) hierarchical clustering method, generally attributed to Sokal and Michener. The WPGMA method is similar to

K-SVD

In applied mathematics, K-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach. K-SVD is a generalization of the k-m

Cluster-weighted modeling

In data mining, cluster-weighted modeling (CWM) is an algorithm-based approach to non-linear prediction of outputs (dependent variables) from inputs (independent variables) based on density estimation

K q-flats

In data mining and machine learning, -flats algorithm is an iterative method which aims to partition observations into clusters where each cluster is close to a -flat, where is a given integer. It is

K-means++

In data mining, k-means++ is an algorithm for choosing the initial values (or "seeds") for the k-means clustering algorithm. It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an app

Low-energy adaptive clustering hierarchy

Low-energy adaptive clustering hierarchy ("LEACH") is a TDMA-based MAC protocol which is integrated with clustering and a simple routing protocol in wireless sensor networks (WSNs). The goal of LEACH

Neighbor joining

In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by and Masatoshi Nei in 1987. Usually based on DNA or protein seque

SimRank

SimRank is a general similarity measure, based on a simple and intuitive graph-theoretic model.SimRank is applicable in any domain with object-to-object relationships, that measures similarity of the

K-medians clustering

In statistics, k-medians clustering is a cluster analysis algorithm. It is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid, one inste

UPGMA

UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. The method is generally attributed to Sokal and Michener. The UPGMA meth

Fuzzy clustering

Fuzzy clustering (also referred to as soft clustering or soft k-means) is a form of clustering in which each data point can belong to more than one cluster. Clustering or cluster analysis involves ass

FLAME clustering

Fuzzy clustering by Local Approximation of MEmberships (FLAME) is a data clustering algorithm that defines clusters in the dense parts of a dataset and performs cluster assignment solely based on the

Ward's method

In statistics, Ward's method is a criterion applied in hierarchical cluster analysis. Ward's minimum variance method is a special case of the objective function approach originally presented by Joe H.

SUBCLU

SUBCLU is an algorithm for clustering high-dimensional data by Karin Kailing, Hans-Peter Kriegel and Peer Kröger. It is a subspace clustering algorithm that builds on the density-based clustering algo