CCA

class hyppo.independence.CCA

Cannonical Correlation Analysis (CCA) test statistic and p-value.

This test can be thought of inferring information from cross-covariance matrices [1]. It has been thought that virtually all parametric tests of significance can be treated as a special case of CCA [2]. The method was first introduced by Harold Hotelling in 1936 [3].

The statistic can be derived as follows [4]:

Let \(x\) and \(y\) be \((n, p)\) samples of random variables \(X\) and \(Y\). We can center \(x\) and \(y\) and then calculate the sample covariance matrix \(\hat{\Sigma}_{xy} = x^T y\) and the variance matrices for \(x\) and \(y\) are defined similarly. Then, the CCA test statistic is found by calculating vectors \(a \in \mathbb{R}^p\) and \(b \in \mathbb{R}^q\) that maximize

\[\mathrm{CCA}_n (x, y) = \max_{a \in \mathbb{R}^p, b \in \mathbb{R}^q} \frac{a^T \hat{\Sigma}_{xy} b} {\sqrt{a^T \hat{\Sigma}_{xx} a} \sqrt{b^T \hat{\Sigma}_{yy} b}}\]

The p-value returned is calculated using a permutation test using hyppo.tools.perm_test.

Methods Summary

CCA.statistic(x, y)

Helper function that calculates the CCA test statistic.

CCA.test(x, y[, reps, workers])

Calculates the CCA test statistic and p-value.


CCA.statistic(x, y)

Helper function that calculates the CCA test statistic.

Parameters

x,y (ndarray) -- Input data matrices. x and y must have the same number of samples and dimensions. That is, the shapes must be (n, p) where n is the number of samples and p is the number of dimensions.

Returns

stat (float) -- The computed CCA statistic.

CCA.test(x, y, reps=1000, workers=1)

Calculates the CCA test statistic and p-value.

Parameters
  • x,y (ndarray) -- Input data matrices. x and y must have the same number of samples and dimensions. That is, the shapes must be (n, p) where n is the number of samples and p is the number of dimensions.

  • reps (int, default: 1000) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.

  • workers (int, default: 1) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns

  • stat (float) -- The computed CCA statistic.

  • pvalue (float) -- The computed CCA p-value.

Examples

>>> import numpy as np
>>> from hyppo.independence import CCA
>>> x = np.arange(7)
>>> y = x
>>> stat, pvalue = CCA().test(x, y)
>>> '%.1f, %.2f' % (stat, pvalue)
'1.0, 0.00'

Examples using hyppo.independence.CCA