tdads.inference
Classes
Module Contents
- class tdads.inference.perm_test(iterations: int = 20, dims: list = [0], p: float = 2.0, q: float = 2.0, paired: bool = False, n_cores: int = cpu_count() - 1)[source]
- __str__()[source]
Describe a permutation test procedure based on the number of permutation iterations and whether the groups are paired or unpaired.
- compute_loss(diagram_groups)[source]
Internal method to compute the loss function from Robinson and Turner 2017. This function should not be called directly.
- test(diagram_groups)[source]
Run the permutation test.
- Parameters:
diagram_groups (list of lists) – The groups of persistence diagrams to be analyzed.
- Returns:
Keys are ‘test_statistics’ for the test statistic in each dimension, ‘permvals’ for the null distribution in each dimension and ‘p_values’ for the p-values in each dimension. For example, output[‘p_values’][‘1’] would give the p-value for the second homological dimension in self.dims.
- Return type:
Dict
Examples
>>> # create two groups of persistence diagrams >>> from ripser import ripser >>> import numpy as np >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> D1 = ripser(data1) >>> D2 = ripser(data2) >>> group1 = [D1, D2] >>> group2 = [D1, D2] >>> # create perm test object in dimensions 0 and 1 >>> from tdads.inference import permutation_test >>> pt = permutation_test(dims = [0, 1], n_cores = 2) >>> # run test >>> res = pt.test([g1, g2]) >>> # get p-values >>> res['p_values']
Citations
Robinson T, Turner K (2017). “Hypothesis testing for topological data analysis.” https://link.springer.com/article/10.1007/s41468-017-0008-7.
Abdallah H et al. (2021). “Statistical Inference for Persistent Homology applied to fMRI.” https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf.
- class tdads.inference.diagram_bootstrap(diag_fun, dims: list = [0], num_samples: int = 20, distance_mat: bool = False, alpha: float = 0.05)[source]
- __str__()[source]
Describe a bootstrap procedure based on the number of bootstrap samples, whether or not the input will be a distance matrix and the Type 1 error rate (alpha).
- compute(X: numpy.ndarray, thresh: float)[source]
Carry out the bootstrap procedure.
- Parameters:
X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.
thresh (float) – The maximum filtration radius for Vietoris-Rips persistent homology.
- Returns:
Entries are ‘diagram’ (the computed persistence diagram), ‘thresholds’ (a Dict of the computed persistence thresholds for each desired dimension) and ‘subsetted_diagram’ (the persistence diagram thresholded by the threshold values in each dimension).
- Return type:
Dict
Examples
>>> from tdads.inference import diagram_bootstrap >>> from ripser import ripser >>> from numpy.random import uniform >>> # build circle dataset >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> # define persistent homology function >>> def diag_fun(X, thresh): >>> return ripser(X = X, thresh = thresh) >>> # create bootstrap object and compute significant features >>> boot = diagram_bootstrap(diag_fun = diag_fun) >>> res = boot.compute(data, 2) >>> # print subsetted diagram >>> res['subsetted_diagram']
Citations
Chazal F et al (2017). “Robust Topological Inference: Distance to a Measure and Kernel Distance.” https://www.jmlr.org/papers/volume18/15-484/15-484.pdf.
- class tdads.inference.universal_null(diag_fun, dims: list = [1], distance_mat: bool = False, alpha: float = 0.05, infinite_cycle_inference: bool = False)[source]
- __str__()[source]
Describe a universal null procedure based on the dimensions being analyzed, whether or not the input will be a distance matrix, the Type 1 error rate (alpha) and whether or not infinite cycle inference will be carried out.
- compute(X: numpy.ndarray, thresh)[source]
Carry out the universal null inference procedure.
- Parameters:
X (numpy.ndarray) – The input dataset - either raw tabular data or a distance matrix of samples.
thresh (float or 'enclosing') – The maximum filtration radius for persistent homology. If ‘enclosing’ then the enclosing radius of X will be computed and used, otherwise thresh must be a set number.
- Returns:
The entries are ‘subsetted_diagram’ - the list of subsetted persistence diagrams in each dimension (numpy ndarrays), and ‘p_values’ - a list of lists for the p-values of each remaining topological feature.
- Return type:
Dict
Examples
>>> from tdads.inference import universal_null >>> from ripser import ripser >>> from numpy.random import uniform, normal >>> # build circle dataset and add noise >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> data = data + normal(scale = 0.2, size = (100, 2)) >>> # define the persistent homology function >>> def diag_fun(X, thresh): >>> return ripser(X = X, thresh = thresh) >>> # create universal null object >>> univ_null = universal_null(diag_fun = diag_fun) >>> # carry out the inference procedure >>> res = univ_null.compute(data)
Citations
Bobrowski O, Skraba P (2023). “A universal null-distribution for topological data analysis.” https://www.nature.com/articles/s41598-023-37842-2.