tdads

Submodules

Classes

distance

kernel

diagram_mds

Multidimensional scaling with persistence diagrams.

diagram_kpca

Kernel PCA with persistence diagrams.

distance

kernel

perm_test

diagram_bootstrap

distance

Functions

check_diagram(D)

Checks for persistence diagrams.

preprocess_diagram(D[, inf_replace_val, ret])

Verify the format of a persistence diagram and convert to a standard format.

enclosing_radius(X[, distance_mat])

Compute the enclosing radius of a dataset. Beyond this filtration radius no

check_diagram(D)

Checks for persistence diagrams.

preprocess_diagram(D[, inf_replace_val, ret])

Verify the format of a persistence diagram and convert to a standard format.

enclosing_radius(X[, distance_mat])

Compute the enclosing radius of a dataset. Beyond this filtration radius no

check_diagram(D)

Checks for persistence diagrams.

preprocess_diagram(D[, inf_replace_val, ret])

Verify the format of a persistence diagram and convert to a standard format.

Package Contents

tdads.check_diagram(D)[source]

Checks for persistence diagrams.

Internal method to verify that birth values are non-negative and less than death values.

Parameters:

D (numpy.ndarray) – The input diagram to be checked.

Return type:

None

tdads.preprocess_diagram(D, inf_replace_val=None, ret=False)[source]

Verify the format of a persistence diagram and convert to a standard format.

This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion).

Parameters:
  • D (any) – The persistence diagram to be verified. An exception will be raised if D is not a persistence diagram computed from one of the aforementioned packages.

  • inf_replace_val (float or int, default None) – The value with which inf values should be replaced, if desired.

  • ret (bool, default False) – Whether or not to return a processed diagram.

Returns:

If ret is True and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram.

Return type:

None or list of numpy.ndarray

tdads.enclosing_radius(X: numpy.ndarray, distance_mat: bool = False)[source]

Compute the enclosing radius of a dataset. Beyond this filtration radius no topological changes can occur.

Parameters:
  • X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.

  • distance_mat (bool, default False) – Whether or not X is a distance matrix. If False then a Euclidean distance matrix will be computed.

Returns:

The enclosing radius value of X.

Return type:

numpy.float64

Examples

>>> from tdads.PH_utils import enclosing_radius
>>> from ripser import ripser
>>> from numpy.random import uniform
>>> from numpy import array, cos, sin
>>> from math import pi
>>> from scipy.spatial.distance import cdist
>>> # build circle dataset
>>> theta = uniform(low = 0, high = 2*pi, size = 100)
>>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)])
>>> # compute the enclosing radius
>>> enc_rad = enclosing_radius(data)
>>> # compute persistence diagram
>>> diagram = ripser(data, enc_rad)
>>> # now for a distance matrix
>>> dist_data = cdist(data, data, 'euclidean')
>>> enc_rad = enclosing_radius(dist_data, True)
>>> diagram = ripser(dist_data, enc_rad, distance_matrix = True)
class tdads.distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
__str__()[source]

Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (p for Wasserstein and sigma for Fisher information metric).

compute(D1, D2) float[source]

Compute the distance between two persistence diagrams.

Parameters:
  • D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).

  • D2 (any) – The second persistence diagram (“”).

Returns:

The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram).

Return type:

float

Examples

>>> from tdads import distance
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create distance object
>>> d_wass = distance() # 2-wasserstein distance
>>> # compute distance
>>> d_wass.compute(diagram1, diagrams2)

Citations

Kerber M, Morozov D and Nigmetov A (2017). “Geometry Helps to Compare Persistence Diagrams.” https://dl.acm.org/doi/10.1145/3064175.

Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.

Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008.

compute_matrix(diagrams: list, other_diagrams: list = None)[source]

Compute a distance matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-distance matrix. Default None. :type other_diagrams: any

Returns:

The (cross) distance matrix.

Return type:

numpy.ndarray

Examples

>>> from tdads import distance
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create distance object
>>> d_wass = distance() # 2-wasserstein distance
>>> # compute distance matrix
>>> d_wass.compute_matrix([diagram1, diagram2])
>>> # this is the same as:
>>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
class tdads.kernel(dim: int = 0, sigma: float = 1, t: float = 1, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
__str__()[source]

Describe a persistence Fisher kernel by its sigma and t parameters.

compute(D1, D2)[source]

Compute the kernel value between two persistence diagrams.

Parameters:
  • D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).

  • D2 (any) – The second persistence diagram.

Returns:

The numeric kernel calculation value.

Return type:

float

Examples

>>> from tdads import kernel
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create kernel object
>>> k = kernel()
>>> # compute kernel value
>>> k.compute(diagram1, diagrams2)

Citations

Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.

compute_matrix(diagrams, other_diagrams=None)[source]

Compute a Gram (kernel) matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-Gram matrix. Default None. :type other_diagrams: any

Returns:

The (cross) Gram matrix.

Return type:

numpy.ndarray

Examples

>>> from tdads import kernel
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create kernel object
>>> k = kernel()
>>> # compute Gram matrix
>>> k.compute_matrix([diagram1, diagram2])
>>> # this is the same as:
>>> k.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
class tdads.diagram_mds(n_components: int = 2, random_state: int = None, precomputed: bool = False, dim: int = 0, metric: str = 'W', p: float = 2, sigma: float = None, n_cores: int = cpu_count() - 1)[source]

Multidimensional scaling with persistence diagrams.

__str__()[source]

Describe a persistence diagram multidimensional scaling object via its distance metric.

fit_transform(X, y: any = None)[source]

Fit the data in X and compute the position of the persistence diagrams in the embedding space.

Parameters:
  • X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed distance matrix of n_diagrams many persistence diagrams (if precomputed was set to True) or a list of n_diagrams many persistence diagrams (otherwise).

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

`X_new`X transformed in the new space.

Return type:

ndarray of shape (n_diagrams, n_components)

Examples

>>> from tdads.machine_learning import diagram_mds
>>> from tdads.distance import distance
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # project into 2D with the 2-wasserstein distance
>>> mds = diagram_mds()
>>> mds.fit_transform([D1, D2])
>>> # can also fit with a precomputed distance matrix
>>> d_wass = distance()
>>> dist_mat = d_wass.compute_matrix([D1, D2])
>>> mds_precomp = diagram_mds(precomputed = True)
>>> mds_precomp.fit_transform(dist_mat)
class tdads.diagram_kpca(n_components: int = 2, random_state: int = None, precomputed: bool = False, diagrams: list = None, dim: int = 0, sigma: float = 1.0, t: float = 1.0, n_cores: int = cpu_count() - 1)[source]

Kernel PCA with persistence diagrams.

__str__()[source]

Describe a persistence diagram kernel principle components analysis object via its kernel function.

fit(X, y: any = None)[source]

Fit the model from data in X.

Parameters:
  • X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed Gram matrix of n_diagrams many persistence diagrams (if precomputed was set to True) or a list of n_diagrams many persistence diagrams (otherwise).

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

`self` – Returns the instance itself.

Return type:

object

Examples

>>> from tdads.machine_learning import diagram_mds
>>> from tdads.kernel import kernel
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # fit model with the persistence Fisher kernel (sigma = t = 1)
>>> kpca = diagram_kpca()
>>> kpca_fitted = kpca.fit([D1, D2])
>>> # can also fit with a precomputed distance matrix
>>> pfk = kernel()
>>> gram_mat = pfk.compute_matrix([D1, D2])
>>> kpca_precomp = diagram_kpca(precomputed = True)
>>> kpca_precomp_fitted = kpca_precomp.fit(gram_mat)
transform(X)[source]

Project new persistence diagrams into the embedding space.

Parameters:

X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed (cross) Gram matrix of shape (n_new_diagrams, n_diagrams) (between the new persistence diagrams and the training set diagrams, if precomputed was set to True) or a list of n_new_diagrams many persistence diagrams (otherwise).

Returns:

`X_new` – The embedding of the new persistence diagrams.

Return type:

ndarray

Examples

>>> from tdads.machine_learning import diagram_mds
>>> from tdads.kernel import kernel
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # fit models (regular and precomputed) with the
>>> # persistence Fisher kernel (sigma = t = 1)
>>> kpca = diagram_kpca()
>>> kpca_fitted = kpca.fit([D1, D2]) # or
>>> pfk = kernel()
>>> gram_mat = pfk.compute_matrix([D1, D2])
>>> kpca_precomp = diagram_kpca(precomputed = True)
>>> kpca_precomp_fitted = kpca_precomp.fit(gram_mat)
>>> # create 2 new datasets
>>> data3 = np.random((100,2))
>>> data4 = np.random((100,2))
>>> # project new data into 2D space
>>> kpca_fitted.transform([D3, D4]) # or
>>> cross_gram = pfk.compute_matrix([D1, D2], [D3, D4])
>>> kpca_precomputed_fitted.transform([D3, D4])
fit_transform(X, y: any = None)[source]

Fit the data in X and compute the position of the persistence diagrams in the embedding space.

Parameters:
  • X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed Gram matrix of n_diagrams many persistence diagrams (if precomputed was set to True) or a list of n_diagrams many persistence diagrams (otherwise).

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

`X_new`X transformed in the new space.

Return type:

ndarray

Examples

>>> from tdads.machine_learning import diagram_mds
>>> from tdads.kernel import kernel
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # fit models (regular and precomputed) with the
>>> # persistence Fisher kernel (sigma = t = 1) and
>>> # project into 2D space
>>> kpca = diagram_kpca()
>>> kpca.fit_transform([D1, D2]) # or
>>> pfk = kernel()
>>> gram_mat = pfk.compute_matrix([D1, D2])
>>> kpca_precomp = diagram_kpca(precomputed = True)
>>> kpca_precomp.fit_transform(gram_mat)
class tdads.distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
__str__()[source]

Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (p for Wasserstein and sigma for Fisher information metric).

compute(D1, D2) float[source]

Compute the distance between two persistence diagrams.

Parameters:
  • D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).

  • D2 (any) – The second persistence diagram (“”).

Returns:

The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram).

Return type:

float

Examples

>>> from tdads import distance
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create distance object
>>> d_wass = distance() # 2-wasserstein distance
>>> # compute distance
>>> d_wass.compute(diagram1, diagrams2)

Citations

Kerber M, Morozov D and Nigmetov A (2017). “Geometry Helps to Compare Persistence Diagrams.” https://dl.acm.org/doi/10.1145/3064175.

Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.

Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008.

compute_matrix(diagrams: list, other_diagrams: list = None)[source]

Compute a distance matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-distance matrix. Default None. :type other_diagrams: any

Returns:

The (cross) distance matrix.

Return type:

numpy.ndarray

Examples

>>> from tdads import distance
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create distance object
>>> d_wass = distance() # 2-wasserstein distance
>>> # compute distance matrix
>>> d_wass.compute_matrix([diagram1, diagram2])
>>> # this is the same as:
>>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
tdads.check_diagram(D)[source]

Checks for persistence diagrams.

Internal method to verify that birth values are non-negative and less than death values.

Parameters:

D (numpy.ndarray) – The input diagram to be checked.

Return type:

None

tdads.preprocess_diagram(D, inf_replace_val=None, ret=False)[source]

Verify the format of a persistence diagram and convert to a standard format.

This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion).

Parameters:
  • D (any) – The persistence diagram to be verified. An exception will be raised if D is not a persistence diagram computed from one of the aforementioned packages.

  • inf_replace_val (float or int, default None) – The value with which inf values should be replaced, if desired.

  • ret (bool, default False) – Whether or not to return a processed diagram.

Returns:

If ret is True and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram.

Return type:

None or list of numpy.ndarray

class tdads.kernel(dim: int = 0, sigma: float = 1, t: float = 1, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
__str__()[source]

Describe a persistence Fisher kernel by its sigma and t parameters.

compute(D1, D2)[source]

Compute the kernel value between two persistence diagrams.

Parameters:
  • D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).

  • D2 (any) – The second persistence diagram.

Returns:

The numeric kernel calculation value.

Return type:

float

Examples

>>> from tdads import kernel
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create kernel object
>>> k = kernel()
>>> # compute kernel value
>>> k.compute(diagram1, diagrams2)

Citations

Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.

compute_matrix(diagrams, other_diagrams=None)[source]

Compute a Gram (kernel) matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-Gram matrix. Default None. :type other_diagrams: any

Returns:

The (cross) Gram matrix.

Return type:

numpy.ndarray

Examples

>>> from tdads import kernel
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create kernel object
>>> k = kernel()
>>> # compute Gram matrix
>>> k.compute_matrix([diagram1, diagram2])
>>> # this is the same as:
>>> k.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
tdads.enclosing_radius(X: numpy.ndarray, distance_mat: bool = False)[source]

Compute the enclosing radius of a dataset. Beyond this filtration radius no topological changes can occur.

Parameters:
  • X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.

  • distance_mat (bool, default False) – Whether or not X is a distance matrix. If False then a Euclidean distance matrix will be computed.

Returns:

The enclosing radius value of X.

Return type:

numpy.float64

Examples

>>> from tdads.PH_utils import enclosing_radius
>>> from ripser import ripser
>>> from numpy.random import uniform
>>> from numpy import array, cos, sin
>>> from math import pi
>>> from scipy.spatial.distance import cdist
>>> # build circle dataset
>>> theta = uniform(low = 0, high = 2*pi, size = 100)
>>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)])
>>> # compute the enclosing radius
>>> enc_rad = enclosing_radius(data)
>>> # compute persistence diagram
>>> diagram = ripser(data, enc_rad)
>>> # now for a distance matrix
>>> dist_data = cdist(data, data, 'euclidean')
>>> enc_rad = enclosing_radius(dist_data, True)
>>> diagram = ripser(dist_data, enc_rad, distance_matrix = True)
class tdads.perm_test(iterations: int = 20, dims: list = [0], p: float = 2.0, q: float = 2.0, paired: bool = False, n_cores: int = cpu_count() - 1)[source]
__str__()[source]

Describe a permutation test procedure based on the number of permutation iterations and whether the groups are paired or unpaired.

compute_loss(diagram_groups)[source]

Internal method to compute the loss function from Robinson and Turner 2017. This function should not be called directly.

test(diagram_groups)[source]

Run the permutation test.

Parameters:

diagram_groups (list of lists) – The groups of persistence diagrams to be analyzed.

Returns:

Keys are ‘test_statistics’ for the test statistic in each dimension, ‘permvals’ for the null distribution in each dimension and ‘p_values’ for the p-values in each dimension. For example, output[‘p_values’][‘1’] would give the p-value for the second homological dimension in self.dims.

Return type:

Dict

Examples

>>> # create two groups of persistence diagrams
>>> from ripser import ripser
>>> import numpy as np
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> D1 = ripser(data1)
>>> D2 = ripser(data2)
>>> group1 = [D1, D2]
>>> group2 = [D1, D2]
>>> # create perm test object in dimensions 0 and 1
>>> from tdads.inference import permutation_test
>>> pt = permutation_test(dims = [0, 1], n_cores = 2)
>>> # run test
>>> res = pt.test([g1, g2])
>>> # get p-values
>>> res['p_values']

Citations

Robinson T, Turner K (2017). “Hypothesis testing for topological data analysis.” https://link.springer.com/article/10.1007/s41468-017-0008-7.

Abdallah H et al. (2021). “Statistical Inference for Persistent Homology applied to fMRI.” https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf.

class tdads.diagram_bootstrap(diag_fun, dims: list = [0], num_samples: int = 20, distance_mat: bool = False, alpha: float = 0.05)[source]
__str__()[source]

Describe a bootstrap procedure based on the number of bootstrap samples, whether or not the input will be a distance matrix and the Type 1 error rate (alpha).

compute(X: numpy.ndarray, thresh: float)[source]

Carry out the bootstrap procedure.

Parameters:
  • X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.

  • thresh (float) – The maximum filtration radius for Vietoris-Rips persistent homology.

Returns:

Entries are ‘diagram’ (the computed persistence diagram), ‘thresholds’ (a Dict of the computed persistence thresholds for each desired dimension) and ‘subsetted_diagram’ (the persistence diagram thresholded by the threshold values in each dimension).

Return type:

Dict

Examples

>>> from tdads.inference import diagram_bootstrap
>>> from ripser import ripser
>>> from numpy.random import uniform
>>> # build circle dataset
>>> theta = uniform(low = 0, high = 2*pi, size = 100)
>>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)])
>>> # define persistent homology function
>>> def diag_fun(X, thresh):
>>>     return ripser(X = X, thresh = thresh)
>>> # create bootstrap object and compute significant features
>>> boot = diagram_bootstrap(diag_fun = diag_fun)
>>> res = boot.compute(data, 2)
>>> # print subsetted diagram
>>> res['subsetted_diagram']

Citations

Chazal F et al (2017). “Robust Topological Inference: Distance to a Measure and Kernel Distance.” https://www.jmlr.org/papers/volume18/15-484/15-484.pdf.

class tdads.distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
__str__()[source]

Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (p for Wasserstein and sigma for Fisher information metric).

compute(D1, D2) float[source]

Compute the distance between two persistence diagrams.

Parameters:
  • D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).

  • D2 (any) – The second persistence diagram (“”).

Returns:

The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram).

Return type:

float

Examples

>>> from tdads import distance
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create distance object
>>> d_wass = distance() # 2-wasserstein distance
>>> # compute distance
>>> d_wass.compute(diagram1, diagrams2)

Citations

Kerber M, Morozov D and Nigmetov A (2017). “Geometry Helps to Compare Persistence Diagrams.” https://dl.acm.org/doi/10.1145/3064175.

Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.

Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008.

compute_matrix(diagrams: list, other_diagrams: list = None)[source]

Compute a distance matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-distance matrix. Default None. :type other_diagrams: any

Returns:

The (cross) distance matrix.

Return type:

numpy.ndarray

Examples

>>> from tdads import distance
>>> from ripser import ripser
>>> import numpy as np
>>> # create 2 datasets
>>> data1 = np.random((100,2))
>>> data2 = np.random((100,2))
>>> # compute persistence diagrams with ripser
>>> diagram1 = ripser(data1)
>>> diagram2 = ripser(data2)
>>> # create distance object
>>> d_wass = distance() # 2-wasserstein distance
>>> # compute distance matrix
>>> d_wass.compute_matrix([diagram1, diagram2])
>>> # this is the same as:
>>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
tdads.check_diagram(D)[source]

Checks for persistence diagrams.

Internal method to verify that birth values are non-negative and less than death values.

Parameters:

D (numpy.ndarray) – The input diagram to be checked.

Return type:

None

tdads.preprocess_diagram(D, inf_replace_val=None, ret=False)[source]

Verify the format of a persistence diagram and convert to a standard format.

This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion).

Parameters:
  • D (any) – The persistence diagram to be verified. An exception will be raised if D is not a persistence diagram computed from one of the aforementioned packages.

  • inf_replace_val (float or int, default None) – The value with which inf values should be replaced, if desired.

  • ret (bool, default False) – Whether or not to return a processed diagram.

Returns:

If ret is True and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram.

Return type:

None or list of numpy.ndarray