tdads.inference

Classes

perm_test

diagram_bootstrap

universal_null

Module Contents

class tdads.inference.perm_test(iterations: int = 20, dims: list = [0], p: float = 2.0, q: float = 2.0, paired: bool = False, n_cores: int = cpu_count() - 1)[source]
__str__()[source]

Describe a permutation test procedure based on the number of permutation iterations and whether the groups are paired or unpaired.

compute_loss(diagram_groups)[source]

Internal method to compute the loss function from Robinson and Turner 2017. This function should not be called directly.

test(diagram_groups)[source]

Run the permutation test.

Parameters:

diagram_groups (list of lists) – The groups of persistence diagrams to be analyzed.

Returns:

Keys are ‘test_statistics’ for the test statistic in each dimension, ‘permvals’ for the null distribution in each dimension and ‘p_values’ for the p-values in each dimension. For example, output[‘p_values’][‘1’] would give the p-value for the second homological dimension in self.dims.

Return type:

Dict

Examples

>>> # create two groups of persistence diagrams
>>> from ripser import ripser
>>> import numpy as np
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> D1 = ripser(data1)
>>> D2 = ripser(data2)
>>> group1 = [D1, D2]
>>> group2 = [D1, D2]
>>> # create perm test object in dimensions 0 and 1
>>> from tdads.inference import permutation_test
>>> pt = permutation_test(dims = [0, 1], n_cores = 2)
>>> # run test
>>> res = pt.test([g1, g2])
>>> # get p-values
>>> res['p_values']

Citations

Robinson T, Turner K (2017). “Hypothesis testing for topological data analysis.” https://link.springer.com/article/10.1007/s41468-017-0008-7.

Abdallah H et al. (2021). “Statistical Inference for Persistent Homology applied to fMRI.” https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf.

class tdads.inference.diagram_bootstrap(diag_fun, dims: list = [0], num_samples: int = 20, distance_mat: bool = False, alpha: float = 0.05)[source]
__str__()[source]

Describe a bootstrap procedure based on the number of bootstrap samples, whether or not the input will be a distance matrix and the Type 1 error rate (alpha).

compute(X: numpy.ndarray, thresh: float)[source]

Carry out the bootstrap procedure.

Parameters:
  • X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.

  • thresh (float) – The maximum filtration radius for Vietoris-Rips persistent homology.

Returns:

Entries are ‘diagram’ (the computed persistence diagram), ‘thresholds’ (a Dict of the computed persistence thresholds for each desired dimension) and ‘subsetted_diagram’ (the persistence diagram thresholded by the threshold values in each dimension).

Return type:

Dict

Examples

>>> from tdads.inference import diagram_bootstrap
>>> from ripser import ripser
>>> from numpy.random import uniform
>>> # build circle dataset
>>> theta = uniform(low = 0, high = 2*pi, size = 100)
>>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)])
>>> # define persistent homology function
>>> def diag_fun(X, thresh):
>>>     return ripser(X = X, thresh = thresh)
>>> # create bootstrap object and compute significant features
>>> boot = diagram_bootstrap(diag_fun = diag_fun)
>>> res = boot.compute(data, 2)
>>> # print subsetted diagram
>>> res['subsetted_diagram']

Citations

Chazal F et al (2017). “Robust Topological Inference: Distance to a Measure and Kernel Distance.” https://www.jmlr.org/papers/volume18/15-484/15-484.pdf.

class tdads.inference.universal_null(diag_fun, dims: list = [1], distance_mat: bool = False, alpha: float = 0.05, infinite_cycle_inference: bool = False)[source]
__str__()[source]

Describe a universal null procedure based on the dimensions being analyzed, whether or not the input will be a distance matrix, the Type 1 error rate (alpha) and whether or not infinite cycle inference will be carried out.

compute(X: numpy.ndarray, thresh)[source]

Carry out the universal null inference procedure.

Parameters:
  • X (numpy.ndarray) – The input dataset - either raw tabular data or a distance matrix of samples.

  • thresh (float or 'enclosing') – The maximum filtration radius for persistent homology. If ‘enclosing’ then the enclosing radius of X will be computed and used, otherwise thresh must be a set number.

Returns:

The entries are ‘subsetted_diagram’ - the list of subsetted persistence diagrams in each dimension (numpy ndarrays), and ‘p_values’ - a list of lists for the p-values of each remaining topological feature.

Return type:

Dict

Examples

>>> from tdads.inference import universal_null
>>> from ripser import ripser
>>> from numpy.random import uniform, normal
>>> # build circle dataset and add noise
>>> theta = uniform(low = 0, high = 2*pi, size = 100)
>>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)])
>>> data = data + normal(scale = 0.2, size = (100, 2))
>>> # define the persistent homology function
>>> def diag_fun(X, thresh):
>>>     return ripser(X = X, thresh = thresh)
>>> # create universal null object
>>> univ_null = universal_null(diag_fun = diag_fun)
>>> # carry out the inference procedure
>>> res = univ_null.compute(data)

Citations

Bobrowski O, Skraba P (2023). “A universal null-distribution for topological data analysis.” https://www.nature.com/articles/s41598-023-37842-2.