Estimators

The Estimators module contains a variety of statistical estimators that can be applied to multivariate datasets.

conaction.estimators.angular_disimilarity(X: ndarray) → float64

Computes the multilinear angular disimilarity. When given an m x 2 data matrix, it is equivalent to the angular distance.

This function computes

\[\text{angular disimilarity} \triangleq \frac{\theta}{\pi}\]

where \(\theta\) is the result of computing the arccosine on the reflective correlation coefficient.

Parameters:: X (array-like) – m x n data matrix
Returns:: angular disimilarity
Return type:: np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> angular_disimilarity(data)
0.00981604173368436

conaction.estimators.circular_correlation(X: ndarray) → float64

This function calculates the n-ary circular correlation coefficient. When given an m x 2 data matrix, it is equivalent to the circular correlation coefficient.

This function estimates

\[R_c \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \sin \left( X_j - \mathbb{E}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |\sin \left( X_j - \mathbb{E}[X_j] \right)|^n \right]}}\]

Parameters:: (array-like) (X) –
Returns:: r – Circular correlation coefficient.
Return type:: np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> circular_correlation(data)
0.9999999999999999

conaction.estimators.correlation_ratio(X: ndarray, y: ndarray) → float64

Warning

Not implemented yet.

This function calculates the multilinear correlation ratio of a collection of response variables given their classes. The classic Fisher’s correlation ratio is a special case.

Parameters:

X (array-like) – m x n data matrix
y (array-like) – m-dimensional vector of class labels

Returns:

Correlation ratio score.

Return type:

np.float64

Raises:

NotImplementedError –

References

conaction.estimators.grade_entropy(X, normalize=True)

Computes a grade entropy for a strict product order on the row space points.

This function computes

\[H_g = \frac{-\sum_{i=1}^{k} p (g_i) \ln (p (g_i))}{\ln{m}}\]

where \(p\) is a probability distribution over the grades \(g\) of the point \(x_i\) among the indexed set of points \(i \in \{1, \cdots, m\}\) according to a strict product order relation.

Parameters:: X (array-like) – An m x n data matrix.
Returns:: entropy – Grade entropy of product order.
Return type:: np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> grade_entropy(data)
1.0

conaction.estimators.kendall_tau(X: ndarray, method='A', n_jobs=1) → float64

Multivariate Kendall’s tau.

Parameters:

X (array-like) – An m x n data matrix.
method ({'A', 'B', 'C'}, optional) –

The method used to account for tied points.
The following methods are available (default is ‘a’):
- ’A’: Original Kendall’s Tau.
- ’B’: \(\tau_B = \frac{m_c - m_d}{\sqrt[n]{\prod_{j=1}^{n} (m_0 - m_j)}}\)
- ’C’: :math:` au_C = frac{2 (m_c - m_d) }{m^2 frac{(max(m,n) - 1)}{max(m,n)}}`

Returns:

Multivariate Kendall’s tau score

Return type:

np.float64

Raises:

NotImplementedError – Method B is not implemented yet.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> kendall_tau(data)
1.0

conaction.estimators.median_correlation(X: ~numpy.ndarray, transform=<function <lambda>>) → float64

Median (multilinear) correlation.

The function estimates

\[R_{\mathcal{M}} \left[ X_1, \cdots, X_n \right] = \frac{\mathcal{M} \left[ \prod_{j=1}^{n} \left( X_j - \mathcal{M}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathcal{M}\left[ |X_n - \mathcal{M}[X_j]|^n \right]}}\]

Parameters:

X (array-like[np.float64]) – An m x n data matrix.
transform (function) – A data transform before computing coefficient.

Returns:

r – The calculated median correlation coefficient.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> median_correlation(data)
0.9999999999999982

conaction.estimators.misiak_correlation(x: ndarray, y: ndarray, X: ndarray) → float64

Misiak’s n-inner correlation coefficient based on the n-inner product space presented in Misiak and Ryz 2000.

Parameters:

x (array-like) – 1-D data vector
y (array-like) – 1-D data vector
X (array-like) – m x n data matrix

Returns:

Misiak correlation score.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> x = np.random.normal(size=10)
>>> y = np.random.normal(size=10)
>>> X = np.random.normal(size=100).reshape(10,10)
>>> misiak_correlation(x,y,X)
-0.11209570083901074

conaction.estimators.nightingale_correlation(X: ndarray, p=1, alphas=None) → float64

Calculates the Nightingale correlation which is a normalized Nightingale covariance onto the interval of [0,1].

Parameters:: X (array-like) – m x n data matrix
Returns:: Nightingale correlation
Return type:: np.float64

See also

nightingale_deviation: Nightingale’s deviation of order p.
nightingale_covariance: Nightingale’s covariance of order p.

References

conaction.estimators.nightingale_covariance(X: ndarray, p=1) → float64

This function calculates the Nightingale covariance which is the multisemimetric between a collection of random variables from their expectations. The multisemimetric is induced by a multiseminorm, which is a generalization of the notion of a seminorm.

Parameters:: X (array-like) – m x n data matrix
Returns:: Nightingale covariance.
Return type:: np.float64

See also

numpy.std: Standard deviation.
nightingale_deviation: Nightingale’s deviation of order p.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> nightingale_covariance(data)
7381024072265624.0

conaction.estimators.nightingale_deviation(x: ndarray, p=2) → float64

Calculates the Nightingale deviation of order p. When the order = 2, it is the same as the standard deviation.

This function estimates

\[\text{Dev}_p \left[ X \right] \triangleq \sqrt[p]{\mathbb{E}\left[ |X - \mathbb{E}[X]|^p \right]}\]

Parameters:: x (array-like.) – Instances of a variable.
Returns:: result
Return type:: np.float64

See also

numpy.std: Standard deviation.

References

Examples

>>> import numpy as np
>>> data = np.arange(10)
>>> minkowski_deviation(data)
2.8722813232690143

conaction.estimators.partial_agnesian(X: ndarray, t=None, k=0)

Computes the partial Agnesian of order k on a data matrix. If a vector of parameters is not provided, then a parameter step size of unity is assumed.

Parameters:

X (array-like (2D)) – m x n data matrix
t (array-like (1D)) – Vector parameters corresponding to the rows of X.
k (Non-negative int) – Non-negative order of the partial Agnesian operator.

Returns:

Sequence of partial Agnesian scores.

Return type:

array-like[float]

Examples

>>> import numpy as np
>>> data = np.arange(5*3).reshape(5,3)
>>> partial_agnesian(X,k=1)
array([27, 27, 27, 27])
>>> partial_agnesian(X, k=2)
array([0, 0, 0])
>>> t = np.linspace(0, 10, 5)
>>> partial_agnesian(X, k=1, t=t)
array([1.728, 1.728, 1.728, 1.728])

conaction.estimators.pearson_correlation(X: ndarray) → float64

This function calculates the n-ary Pearson’s r correlation coefficient. When given an m x 2 data matrix, it is equivalent to the Pearson’s r correlation coefficient.

This function estimates

\[R_p \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \left( X_j - \mathbb{E}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |X_n- \mathbb{E}[X_j]|^n \right]}}\]

Parameters:: X (array-like) – An m x n data matrix.
Returns:: r – The calculated Pearson r correlation coefficient.
Return type:: np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> pearson_correlation(data)
0.9999999999999978

conaction.estimators.pnorm(x: ndarray, p=2) → float64

Computes the p-norm of a given vector.

Parameters:

x (1D array-like) – An m-dimensional vector.
p (float) – Order of the norm.

Returns:

np.float64
P-norm of input vector.

References

conaction.estimators.product_percentiles(X: array) → array

Compute joint percentiles under a product order.

Parameters:

X (np.array[float]) – Data matrix.

Returns:

float
Joint percentiles.

conaction.estimators.product_rank(X, monotone=False)

Assign product order rank to each point.

Parameters:

X (np.array) – Data matrix.
montone (bool) – Whether to rank monotonically or antimonotonically.

conaction.estimators.pseudograde_entropy(X: ndarray, n_jobs=1) → float64

Computes a pseudograde entropy for a strict product order on the row space points.

This function computes

\[H_g = \frac{-\sum_{i=1}^{k} p (g_i) \ln (p (g_i))}{\ln{m}}\]

where \(p\) is a probability distribution over the pseudogrades \(g\) of the point \(x_i\) among the indexed set of points \(i \in \{1, \cdots, m\}\) according to a strict product order relation.

Parameters:: X (array-like) – An m x n data matrix.
Returns:: entropy – Pseudograde entropy of product order.
Return type:: np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> pseudograde_entropy(data)
1.0

conaction.estimators.reach_percentiles(g: DiGraph) → Dict[int, float]

Compute reach percentiles from a digraph.

Parameters:: g (nx.DiGraph) – NetworkX DiGraph.
Return type:: dict[int,float]

conaction.estimators.reach_rank(g: DiGraph) → Dict[int, int]

Reachable rank of nodes in a digraph.

Parameters:

g (nx.DiGraph) – NetworkX DiGraph.

Returns:

dict[int,int]
Reach rank for each node.

conaction.estimators.reflective_correlation(X: ndarray) → float64

Calculates the multilinear reflective correlation coefficient. When given an m x 2 data matrix, it is equivalent to the reflective correlation coefficient.

This function estimates

\[R_r \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} X_j \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |X_n|^n \right]}}\]

Parameters:: X (array-like) – The m x n data matrix.
Returns:: r – Reflective correlation coefficient score.
Return type:: np.float64

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> reflective_correlation(data)
0.9995245464170066

conaction.estimators.signum_correlation(X: ndarray) → float64

Signum correlation coefficient.

This function estimates

\[R_{\text{sign}} \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \text{sign} \left( X_j - \mathbb{E}[X_j] \right) \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |\text{sign} \left( X_j - \mathbb{E}[X_j] \right)|^n \right]}}\]

Parameters:: X (array-like) – m x n data matrix
Returns:: Signum correlation score.
Return type:: float

See also

scipy.stats.kendalltau: Kendall’s \(\tau\)

Notes

On the face of it this coefficient seems the same as Kendall’s \(\tau\) due to taking products of signs, however they are distinct. Kendall’s \(\tau\) computes an average of the discordant pairs subtracted from the concordant pairs of points.

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> signum_correlation(data)
0.9999999999999998

conaction.estimators.spearman_correlation(X: ndarray, method='average') → float64

This function calculates the n-ary Spearman correlation coefficient. When given an m x 2 data matrix, it is equivalent to the Spearman’s Rho correlation coefficient.

This function estimates

\[R_c \left[ X_1, \cdots, X_n \right] = \frac{\mathbb{E} \left[ \prod_{j=1}^{n} \text{rank} \left( X_j \right) - \mathbb{E}[\text{rank} \left( X_j \right)] \right]}{\prod_{j=1}^{n} \sqrt[n]{\mathbb{E}\left[ |\text{rank} \left( X_j \right) - \mathbb{E}[\text{rank} \left( X_j \right)]|^n \right]}}\]

Parameters:

X (array-like) – m x n data matrix.
method ({'average', 'min', 'max', 'dense', 'ordinal'}, optional) –

The method used to assign ranks to tied elements.
The following methods are available (default is ‘average’):
- ’average’: The average of the ranks that would have been assigned to all the tied values is assigned to each value.
- ’min’: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as “competition” ranking.)
- ’max’: The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
- ’dense’: Like ‘min’, but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
- ’ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in a.

Returns:

Spearman’s correlation coefficient.

Return type:

np.float64

See also

scipy.stats.rankdata

Notes

The available data ranking options are directly from scipy.stats.rankdata.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> spearman_correlation(data)
0.9999999999999991

conaction.estimators.taylor_correlation(X: ndarray) → float64

Taylor’s multi-way correlation coefficient.

Taylor 2020 defines this function to be

\[\frac{1}{\sqrt{d}} \sqrt{\frac{1}{d-1} \sum_{i}^{d} ( \lambda_i - \bar{\lambda})^2 }\]

where \(d\) is the number of variables, \(\lambda_1, \cdots, \lambda_d\) are the eigenvalues of the correlation matrix for a given set of variables, and \(\bar{\lambda}\) is the mean of those eigenvalues.

Parameters:: X (array-like) – m x n data matrix
Returns:: Taylor correlation score
Return type:: np.float64

Notes

Taylor’s multi-way correlation coefficient is a rescaling of the Bessel-corrected standard deviation of the eigenvalues of the correlation matrix of the set of variables.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> taylor_correlation(data)
0.9486832980505138

conaction.estimators.trencevski_malceski_correlation(X: ndarray, Y: ndarray) → float64

Generalized n-inner product correlation coefficient.

Computes a correlation coefficient based on Trencevski and Melceski 2006.

Parameters:

X (array-like) – m x n data matrix
Y (array-like) – m x n data matrix

Returns:

Correlation score.

Return type:

np.float64

References

Examples

>>> import numpy as np
>>> np.random.seed(0)
>>> Y = np.random.normal(size=1000).reshape(100,10)
>>> X = np.random.normal(size=1000).reshape(100,10)
>>> trencevski_malceski_correlation(X,Y)
3.1886981411745035e-08

conaction.estimators.wang_zheng_correlation(X: ndarray) → float64

Correlation coefficient due to Wang & Zheng 2014.

This correlation coefficient is equivalent to

\[R_{wz} \triangleq 1 - \det (R_{n \times n})\]

where \(R_{n \times n}\) is the correlation matrix computed on a collection of n variables. In other words, this correlation coefficient is the complement of the determinant of the correlation matrix.

Parameters:: X (array-like) – m x n data matrix
Returns:: result – Unsigned correlation coefficient.
Return type:: np.float64

Notes

The complement of this statistic is the unsigned incorrelation coefficient.

References

Examples

>>> import numpy as np
>>> data = np.arange(100).reshape(10,10)
>>> wang_zheng_correlation(data)
1.0

conaction.estimators.weak_inner_correlation()

Raises:: NotImplementedError –

References