Estimators

The Estimators module contains a variety of statistical estimators that can be applied to multivariate datasets.

conaction.estimators.angular_disimilarity(X: ndarray) float64

Computes the multilinear angular disimilarity. When given an m x 2 data matrix, it is equivalent to the angular distance.

This function computes

\[\text{angular disimilarity} \triangleq \frac{\theta}{\pi}\]

where \(\theta\) is the result of computing the arccosine on the reflective correlation coefficient.

Parameters:

X (array-like) – m x n data matrix

Returns:

angular disimilarity

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> angular_disimilarity(data)
0.00981604173368436
conaction.estimators.circular_correlation(X: ndarray) float64

This function calculates the n-ary circular correlation coefficient. When given an m x 2 data matrix, it is equivalent to the circular correlation coefficient.

This function estimates

\[R_c \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \sin \left( X_j - \mathbb{E}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |\sin \left( X_j - \mathbb{E}[X_j] \right)|^n \right]}}\]
Parameters:

(array-like) (X) –

Returns:

r – Circular correlation coefficient.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> circular_correlation(data)
0.9999999999999999
conaction.estimators.correlation_ratio(X: ndarray, y: ndarray) float64

Warning

Not implemented yet.

This function calculates the multilinear correlation ratio of a collection of response variables given their classes. The classic Fisher’s correlation ratio is a special case.

Parameters:
  • X (array-like) – m x n data matrix

  • y (array-like) – m-dimensional vector of class labels

Returns:

Correlation ratio score.

Return type:

np.float64

Raises:

NotImplementedError

References

conaction.estimators.grade_entropy(X, normalize=True)

Computes a grade entropy for a strict product order on the row space points.

This function computes

\[H_g = \frac{-\sum_{i=1}^{k} p (g_i) \ln (p (g_i))}{\ln{m}}\]

where \(p\) is a probability distribution over the grades \(g\) of the point \(x_i\) among the indexed set of points \(i \in \{1, \cdots, m\}\) according to a strict product order relation.

Parameters:

X (array-like) – An m x n data matrix.

Returns:

entropy – Grade entropy of product order.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> grade_entropy(data)
1.0
conaction.estimators.kendall_tau(X: ndarray, method='A', n_jobs=1) float64

Multivariate Kendall’s tau.

Parameters:
  • X (array-like) – An m x n data matrix.

  • method ({'A', 'B', 'C'}, optional) –

    The method used to account for tied points.

    The following methods are available (default is ‘a’):

    • ’A’: Original Kendall’s Tau.

    • ’B’: \(\tau_B = \frac{m_c - m_d}{\sqrt[n]{\prod_{j=1}^{n} (m_0 - m_j)}}\)

    • ’C’: :math:` au_C = frac{2 (m_c - m_d) }{m^2 frac{(max(m,n) - 1)}{max(m,n)}}`

Returns:

Multivariate Kendall’s tau score

Return type:

np.float64

Raises:

NotImplementedError – Method B is not implemented yet.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> kendall_tau(data)
1.0
conaction.estimators.median_correlation(X: ~numpy.ndarray, transform=<function <lambda>>) float64

Median (multilinear) correlation.

The function estimates

\[R_{\mathcal{M}} \left[ X_1, \cdots, X_n \right] = \frac{\mathcal{M} \left[ \prod_{j=1}^{n} \left( X_j - \mathcal{M}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathcal{M}\left[ |X_n - \mathcal{M}[X_j]|^n \right]}}\]
Parameters:
  • X (array-like[np.float64]) – An m x n data matrix.

  • transform (function) – A data transform before computing coefficient.

Returns:

r – The calculated median correlation coefficient.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> median_correlation(data)
0.9999999999999982
conaction.estimators.misiak_correlation(x: ndarray, y: ndarray, X: ndarray) float64

Misiak’s n-inner correlation coefficient based on the n-inner product space presented in Misiak and Ryz 2000.

Parameters:
  • x (array-like) – 1-D data vector

  • y (array-like) – 1-D data vector

  • X (array-like) – m x n data matrix

Returns:

Misiak correlation score.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> x = np.random.normal(size=10)
>>> y = np.random.normal(size=10)
>>> X = np.random.normal(size=100).reshape(10,10)
>>> misiak_correlation(x,y,X)
-0.11209570083901074
conaction.estimators.nightingale_correlation(X: ndarray, p=1, alphas=None) float64

Calculates the Nightingale correlation which is a normalized Nightingale covariance onto the interval of [0,1].

Parameters:

X (array-like) – m x n data matrix

Returns:

Nightingale correlation

Return type:

np.float64

See also

nightingale_deviation

Nightingale’s deviation of order p.

nightingale_covariance

Nightingale’s covariance of order p.

References

conaction.estimators.nightingale_covariance(X: ndarray, p=1) float64

This function calculates the Nightingale covariance which is the multisemimetric between a collection of random variables from their expectations. The multisemimetric is induced by a multiseminorm, which is a generalization of the notion of a seminorm.

Parameters:

X (array-like) – m x n data matrix

Returns:

Nightingale covariance.

Return type:

np.float64

See also

numpy.std

Standard deviation.

nightingale_deviation

Nightingale’s deviation of order p.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> nightingale_covariance(data)
7381024072265624.0
conaction.estimators.nightingale_deviation(x: ndarray, p=2) float64

Calculates the Nightingale deviation of order p. When the order = 2, it is the same as the standard deviation.

This function estimates

\[\text{Dev}_p \left[ X \right] \triangleq \sqrt[p]{\mathbb{E}\left[ |X - \mathbb{E}[X]|^p \right]}\]
Parameters:

x (array-like.) – Instances of a variable.

Returns:

result

Return type:

np.float64

See also

numpy.std

Standard deviation.

References

Examples

>>> import numpy as np
>>> data = np.arange(10)
>>> minkowski_deviation(data)
2.8722813232690143
conaction.estimators.partial_agnesian(X: ndarray, t=None, k=0)

Computes the partial Agnesian of order k on a data matrix. If a vector of parameters is not provided, then a parameter step size of unity is assumed.

Parameters:
  • X (array-like (2D)) – m x n data matrix

  • t (array-like (1D)) – Vector parameters corresponding to the rows of X.

  • k (Non-negative int) – Non-negative order of the partial Agnesian operator.

Returns:

Sequence of partial Agnesian scores.

Return type:

array-like[float]

Examples

>>> import numpy as np
>>> data = np.arange(5*3).reshape(5,3)
>>> partial_agnesian(X,k=1)
array([27, 27, 27, 27])
>>> partial_agnesian(X, k=2)
array([0, 0, 0])
>>> t = np.linspace(0, 10, 5)
>>> partial_agnesian(X, k=1, t=t)
array([1.728, 1.728, 1.728, 1.728])
conaction.estimators.pearson_correlation(X: ndarray) float64

This function calculates the n-ary Pearson’s r correlation coefficient. When given an m x 2 data matrix, it is equivalent to the Pearson’s r correlation coefficient.

This function estimates

\[R_p \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \left( X_j - \mathbb{E}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |X_n- \mathbb{E}[X_j]|^n \right]}}\]
Parameters:

X (array-like) – An m x n data matrix.

Returns:

r – The calculated Pearson r correlation coefficient.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> pearson_correlation(data)
0.9999999999999978
conaction.estimators.pnorm(x: ndarray, p=2) float64

Computes the p-norm of a given vector.

Parameters:
  • x (1D array-like) – An m-dimensional vector.

  • p (float) – Order of the norm.

Returns:

  • np.float64

  • P-norm of input vector.

References

conaction.estimators.product_percentiles(X: array) array

Compute joint percentiles under a product order.

Parameters:

X (np.array[float]) – Data matrix.

Returns:

  • float

  • Joint percentiles.

conaction.estimators.product_rank(X, monotone=False)

Assign product order rank to each point.

Parameters:
  • X (np.array) – Data matrix.

  • montone (bool) – Whether to rank monotonically or antimonotonically.

conaction.estimators.pseudograde_entropy(X: ndarray, n_jobs=1) float64

Computes a pseudograde entropy for a strict product order on the row space points.

This function computes

\[H_g = \frac{-\sum_{i=1}^{k} p (g_i) \ln (p (g_i))}{\ln{m}}\]

where \(p\) is a probability distribution over the pseudogrades \(g\) of the point \(x_i\) among the indexed set of points \(i \in \{1, \cdots, m\}\) according to a strict product order relation.

Parameters:

X (array-like) – An m x n data matrix.

Returns:

entropy – Pseudograde entropy of product order.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> pseudograde_entropy(data)
1.0
conaction.estimators.reach_percentiles(g: DiGraph) Dict[int, float]

Compute reach percentiles from a digraph.

Parameters:

g (nx.DiGraph) – NetworkX DiGraph.

Return type:

dict[int,float]

conaction.estimators.reach_rank(g: DiGraph) Dict[int, int]

Reachable rank of nodes in a digraph.

Parameters:

g (nx.DiGraph) – NetworkX DiGraph.

Returns:

  • dict[int,int]

  • Reach rank for each node.

conaction.estimators.reflective_correlation(X: ndarray) float64

Calculates the multilinear reflective correlation coefficient. When given an m x 2 data matrix, it is equivalent to the reflective correlation coefficient.

This function estimates

\[R_r \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} X_j \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |X_n|^n \right]}}\]
Parameters:

X (array-like) – The m x n data matrix.

Returns:

r – Reflective correlation coefficient score.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> reflective_correlation(data)
0.9995245464170066
conaction.estimators.signum_correlation(X: ndarray) float64

Signum correlation coefficient.

This function estimates

\[R_{\text{sign}} \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \text{sign} \left( X_j - \mathbb{E}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |\text{sign} \left( X_j - \mathbb{E}[X_j] \right)|^n \right]}}\]
Parameters:

X (array-like) – m x n data matrix

Returns:

Signum correlation score.

Return type:

float

See also

scipy.stats.kendalltau

Kendall’s \(\tau\)

Notes

On the face of it this coefficient seems the same as Kendall’s \(\tau\) due to taking products of signs, however they are distinct. Kendall’s \(\tau\) computes an average of the discordant pairs subtracted from the concordant pairs of points.

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> signum_correlation(data)
0.9999999999999998
conaction.estimators.spearman_correlation(X: ndarray, method='average') float64

This function calculates the n-ary Spearman correlation coefficient. When given an m x 2 data matrix, it is equivalent to the Spearman’s Rho correlation coefficient.

This function estimates

\[R_c \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \text{rank} \left( X_j \right) - \mathbb{E}[\text{rank} \left( X_j \right)] \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |\text{rank} \left( X_j \right) - \mathbb{E}[\text{rank} \left( X_j \right)]|^n \right]}}\]
Parameters:
  • X (array-like) – m x n data matrix.

  • method ({'average', 'min', 'max', 'dense', 'ordinal'}, optional) –

    The method used to assign ranks to tied elements.

    The following methods are available (default is ‘average’):

    • ’average’: The average of the ranks that would have been assigned to all the tied values is assigned to each value.

    • ’min’: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)

    • ’max’: The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.

    • ’dense’: Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.

    • ’ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in a.

Returns:

Spearman’s correlation coefficient.

Return type:

np.float64

See also

scipy.stats.rankdata

Notes

The available data ranking options are directly from scipy.stats.rankdata.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> spearman_correlation(data)
0.9999999999999991
conaction.estimators.taylor_correlation(X: ndarray) float64

Taylor’s multi-way correlation coefficient.

Taylor 2020 defines this function to be

\[\frac{1}{\sqrt{d}} \sqrt{\frac{1}{d-1} \sum_{i}^{d} ( \lambda_i - \bar{\lambda})^2 }\]

where \(d\) is the number of variables, \(\lambda_1, \cdots, \lambda_d\) are the eigenvalues of the correlation matrix for a given set of variables, and \(\bar{\lambda}\) is the mean of those eigenvalues.

Parameters:

X (array-like) – m x n data matrix

Returns:

Taylor correlation score

Return type:

np.float64

Notes

Taylor’s multi-way correlation coefficient is a rescaling of the Bessel-corrected standard deviation of the eigenvalues of the correlation matrix of the set of variables.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> taylor_correlation(data)
0.9486832980505138
conaction.estimators.trencevski_malceski_correlation(X: ndarray, Y: ndarray) float64

Generalized n-inner product correlation coefficient.

Computes a correlation coefficient based on Trencevski and Melceski 2006.

Parameters:
  • X (array-like) – m x n data matrix

  • Y (array-like) – m x n data matrix

Returns:

Correlation score.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> Y = np.random.normal(size=1000).reshape(100,10)
>>> X = np.random.normal(size=1000).reshape(100,10)
>>> trencevski_malceski_correlation(X,Y)
3.1886981411745035e-08
conaction.estimators.wang_zheng_correlation(X: ndarray) float64

Correlation coefficient due to Wang & Zheng 2014.

This correlation coefficient is equivalent to

\[R_{wz} \triangleq 1 - \det (R_{n \times n})\]

where \(R_{n \times n}\) is the correlation matrix computed on a collection of n variables. In other words, this correlation coefficient is the complement of the determinant of the correlation matrix.

Parameters:

X (array-like) – m x n data matrix

Returns:

result – Unsigned correlation coefficient.

Return type:

np.float64

Notes

The complement of this statistic is the unsigned incorrelation coefficient.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> wang_zheng_correlation(data)
1.0
conaction.estimators.weak_inner_correlation()
Raises:

NotImplementedError

References