DiscrimTwoSample

class hyppo.discrim.DiscrimTwoSample(is_dist=False, remove_isolates=True)

A class that compares the discriminability of two datasets. Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1].

Let \(\hat D_{x_1}\) denote the sample discriminability of one approach, and \(\hat D_{x_2}\) denote the sample discriminability of another approach. Then, .. math:

H_0: D_{x_1} &= D_{x_2} \\
H_A: D_{x_1} &> D_{x_2}

Alternatively, tests can be done for \(D_{x_1} < D_{x_2}\) and \(D_{x_1} \neq D_{x_2}\).

Parameters
  • is_dist (bool, default: False) -- Whether x1 and x2 are distance matrices or not.

  • remove_isolates (bool, default: True) -- Whether to remove the measurements with a single instance or not.

Methods Summary

DiscrimTwoSample.statistic(x, y)

Helper function that calculates the discriminability test statistic.

DiscrimTwoSample.test(x1, x2, y[, reps, ...])

Calculates the test statistic and p-value for a two sample test for discriminability.


DiscrimTwoSample.statistic(x, y)

Helper function that calculates the discriminability test statistic.

Parameters

x, y (ndarray) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n).

Returns

stat (float) -- The computed two sample discriminability statistic.

DiscrimTwoSample.test(x1, x2, y, reps=1000, alt='neq', workers=- 1)

Calculates the test statistic and p-value for a two sample test for discriminability.

Parameters
  • x1, x2 (ndarray) -- Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), and is_dist must set to True in this case.

  • y (ndarray) -- A vector containing the sample ids for our n samples. Should be matched to the inputs such that y[i] is the corresponding label for x_1[i, :] and x_2[i, :].

  • reps (int, optional (default: 1000)) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.

  • alt ({"greater", "less", "neq"} (default: "neq")) -- The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").

  • workers (int, optional (default: -1)) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns

  • d1 (float) -- The computed discriminability score for x1.

  • d2 (float) -- The computed discriminability score for x2.

  • pvalue (float) -- The computed two sample test p-value.

Examples

>>> import numpy as np
>>> from hyppo.discrim import DiscrimTwoSample
>>> x1 = np.ones((100,2), dtype=float)
>>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0)
>>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0)
>>> discrim1, discrim2, pvalue = DiscrimTwoSample().test(x1, x2, y)
>>> '%.1f, %.1f, %.2f' % (discrim1, discrim2, pvalue)
'0.5, 1.0, 0.00'

Examples using hyppo.discrim.DiscrimTwoSample