FASTICA

fastICA is a an R, Splus (5.x, 6.x) and C implementation of the FastICA algorithms developed by Aapo Hyvarinen et al at the Neural Networks Research Centre , Laboratory of Computer and Information Science, Helsinki University of Technology. Independent Component Analysis (ICA) is a method of decomposing a multi-dimensional dataset into a set of statistically independent non-gaussian variables. The method is based upon a generative model in which measured signals are constructed from linear mixtures of unknown latent variables or sources. These sources are assumed to be statistically independent and non-gaussian. ICA attempts to unmix the measured signals and recover the sources.

The method relies upon the fact that mixtures of independent variables tend to become more gaussian in distribution when they are mixed linearly (by the Central Limit Theorem). Thus in order to recover the independent sources we should maximise some measure of non-gaussianity. The FastICA algorithm (as its name suggests) is designed to provide a computationally quick method of estimating the unobserved independent components. The algorithm iteratively maximises an approximation to the negentropy of the projected data. Negentropy is based on the information-theoretic quantity of (differential) entropy which measures the “randomness” of an observed variable. Since gaussian variables have the largest entropy among all random variables of equal variance entropy can be used to define a measure of non-gaussianity i.e. negentropy. In practice this quantity can be time consuming to calculate. This led to the development of the fast and robust approximations implemented in the FastICA algorithm. The algorithm has been implemented by the original authors in MATLAB .

In the absence of a generative model for the data the algorithm can be used to find Projection Pursuit directions. Projection Pursuit is a technique for finding ‘interesting’ directions/projections in multi-dimensional datasets. These projectoins and are useful for visualising the dataset and in density estimation and regression. Interesting directions are those which show the least Gaussian distribution, which is what the FastICA algorithm does.

The fastICA package contains both R/Splus and C code to implement the FastICA algorithm. The R/Splus code is included for clarity whereas the C code allows the method to be run much faster. When the package is compiled the code is linked to optimized BLAS routines if they are present on your machine. If not then unoptimized BLAS routines are compiled separately, which makes the code faster than the R code but not as fast as it could be. Most of the C code included in the package was written by Chris Heaton who is a summer research student at the Department of Statistics, University of Oxford.

The R package is available from CRAN

A standalone C version of the code is also available fastiCa.tgz  The code is essentially the same as that used in the R package described above but uses the ranlib RNG library. Please read the README file included in the directory for instructions on compilation. I have successfully compiled this code on my Linux machine but thats about it so far. The code is distributed under the GPL license (for details see the file COPYING).