Tidewater glacier cycle: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Amizra
en>Monkbot
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
In [[multivariate statistics]] and the [[cluster analysis|clustering]] of data, '''spectral clustering'''<ref> U. von Luxburg, "A tutorial on spectral clustering", Stat. Comp. Vol. 17, Issue 4 , 395-416 (2007), [http://papercore.org/vonLuxburg2007 Papercore summary http://papercore.org/vonLuxburg2007 ]  </ref> techniques make use of the [[Spectrum of a matrix|spectrum]] ([[eigenvalues]]) of the [[similarity matrix]] of the data to perform [[dimensionality reduction]] before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.[[File:K-means v.s. Spectral Clustering.png|thumb|A figure showing the relative strengths of K-means and spectral clustering.<ref>{{Citation
Andrew Simcox is the title his mothers and fathers gave him and he completely enjoys this title. Some time ago she chose to live in Alaska and her mothers and fathers live nearby. My day job is an invoicing officer but I've already applied for another 1. To climb is something I truly enjoy doing.<br><br>My website; [http://black7.mireene.com/aqw/5741 telephone psychic]
| Author = Martin, Charles
| url = http://charlesmartin14.wordpress.com/2012/10/09/spectral-clustering/
| date = October 9, 2012}}</ref>]]
 
== Algorithms ==
 
Given a set of data points A, the [[similarity matrix]] may be defined as a matrix <math>S</math>, where <math>S_{ij}</math> represents a measure of the similarity between points <math>i, j\in A</math>.
 
One spectral clustering technique is the '''[[Segmentation_based_object_categorization#Normalized_cuts|normalized cuts algorithm]]''' or ''Shi–Malik algorithm'' introduced by Jianbo Shi and Jitendra Malik,<ref>Jianbo Shi and Jitendra Malik, [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf "Normalized Cuts and Image Segmentation"], IEEE Transactions on PAMI, Vol. 22, No. 8, Aug 2000.</ref> commonly used for [[segmentation (image processing)|image segmentation]]. It partitions points into two sets <math>(B_1,B_2)</math> based on the [[eigenvector]] <math>v</math> corresponding to the second-smallest [[eigenvalue]] of the normalized [[Laplacian matrix]]
 
:<math>L = I - D^{-1/2}SD^{-1/2} \, </math>
 
of <math>S</math>, where <math>D</math> is the diagonal matrix
 
:<math>D_{ii} = \sum_j S_{ij}.</math>
 
This partitioning may be done in various ways, such as by taking the median <math>m</math> of the components in <math>v</math>, and placing all points whose component in <math>v</math> is greater than <math>m</math> in <math>B_1</math>, and the rest in <math>B_2</math>. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.
 
A related algorithm is the '''[[Meila–Shi algorithm]]''',<ref>Marina Meilă & Jianbo Shi, "[http://www.citeulike.org/user/mpotamias/article/498897 Learning Segmentation by Random Walks]", Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.</ref> which takes the [[eigenvector]]s corresponding to the ''k'' largest [[eigenvalue]]s of the matrix <math>P = D^{-1}S</math> for some ''k'', and then invokes another algorithm (e.g. [[k-means clustering]]) to cluster points by their respective ''k'' components in these eigenvectors.
 
An efficiency improvement of spectral clustering is the '''[[spectral neighborhood (SPAN) algorithm]]''',<ref>Liangcai Shu, Aiyou Chen, Ming Xiong, Weiyi Meng, "[http://www.cs.binghamton.edu/~meng/pub.d/ICDE11_conf_full_065_update.pdf Efficient Spectral Neighborhood Blocking for Entity Resolution]", IEEE International Conference on Data Engineering (ICDE), pp. 1067–1078, Hannover, Germany, April 2011.</ref> which performs spectral clustering without explicitly computing the similarity matrix, and therefore dramatically improves the scalability of the standard spectral clustering algorithm.
 
Spectral clustering is closely related to [[Nonlinear dimensionality reduction]], and dimension reduction techniques such as locally-linear embedding can be used to reduce errors from noise or outliers.<ref>{{Citation
| author = Arias-Castro, E. and Chen, G. and Lerman, G.
| title = Spectral clustering based on local linear approximations.
| journal = Electronic Journal of Statistics | volume = 5 | page = 1537-1587
| year = 2011}}</ref>
 
== Relationship with ''k''-means ==
The kernel ''k''-means problem is an extension of the ''k''-means problem where the input data points are mapped non-linearly into a higher-dimensional feature space via a kernel function <math>k(x_i,x_j) = \phi^T(x_i)\phi(x_j)</math>. The weighted kernel ''k''-means problem further extends this problem by defining a weight <math>w_r</math> for each cluster as the reciprocal of the number of elements in the cluster,
:<math>
\max_{C_i} \sum_{r=1}^k w_r \sum_{x_i,x_j \in C_r} k(x_i,x_j).
</math>
Suppose <math>F</math> is a matrix of the normalizing coefficients for each point for each cluster <math>F_{ij} = w_r</math> if <math>i,j \in C_r</math> and zero otherwise. Suppose <math>K</math> is the kernel matrix for all points. The weighted kernel ''k''-means problem with n points and k clusters is given as,
:<math>
\max_{F} \operatorname{ trace } \left(KF\right)
</math>
such that,
:<math>
F = G_{n\times k}G_{n\times k}^T
</math>
:<math>
G^TG = I
</math>
such that <math>\text{rank}(G) = k</math>. In addition, there are identity constrains on <math>F</math> given by,
:<math>
F\cdot \mathbb{I} = \mathbb{I}
</math>
where <math>\mathbb{I}</math> represents a vector of ones.
:<math>
F^T\mathbb{I} = \mathbb{I}
</math>
This problem can be recast as,
:<math>
\max_G \text{ trace }\left(G^TG\right).
</math>
This problem is equivalent to the spectral clustering problem when the identity constraints on <math>F</math> are relaxed. In particular, the weighted kernel ''k''-means problem can be reformulated as a spectral clustering (graph partitioning) problem and vice-versa. The output of the algorithms are eigenvectors which do not satisfy the identity requirements for indicator variables defined by <math>F</math>. Hence, post-processing of the eigenvectors is required for the equivalence between the problems.<ref name="dhillon2004kernel">{{cite conference
| author = Dhillon, I.S. and Guan, Y. and Kulis, B.
| year = 2004
| title = Kernel ''k''-means: spectral clustering and normalized cuts
| booktitle = Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
| pages = 551–556
}}</ref>
Transforming the spectral clustering problem into a weighted kernel ''k''-means problem greatly reduces the computational burden.<ref>{{cite journal|last=Dhillon|first=Inderjit|coauthors=Yuqiang Guan, Brian Kulis|title=Weighted Graph Cuts without Eigenvectors:  A Multilevel Approach|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|date=November 2007|year=2007|volume=29|issue=11|pages=1–14}}</ref>
 
== See also ==
* [[Affinity propagation]]
* [[Kernel principal component analysis]]
* [[Cluster analysis]]
* [[Spectral graph theory]]
 
== References ==
<references />
 
[[Category:Data clustering algorithms]]
[[Category:Algebraic graph theory]]

Latest revision as of 01:39, 27 July 2014

Andrew Simcox is the title his mothers and fathers gave him and he completely enjoys this title. Some time ago she chose to live in Alaska and her mothers and fathers live nearby. My day job is an invoicing officer but I've already applied for another 1. To climb is something I truly enjoy doing.

My website; telephone psychic