tdads
Submodules
Classes
Multidimensional scaling with persistence diagrams. |
|
Kernel PCA with persistence diagrams. |
|
Functions
Checks for persistence diagrams. |
|
|
Verify the format of a persistence diagram and convert to a standard format. |
|
Compute the enclosing radius of a dataset. Beyond this filtration radius no |
Checks for persistence diagrams. |
|
|
Verify the format of a persistence diagram and convert to a standard format. |
|
Compute the enclosing radius of a dataset. Beyond this filtration radius no |
Checks for persistence diagrams. |
|
|
Verify the format of a persistence diagram and convert to a standard format. |
Package Contents
- tdads.check_diagram(D)[source]
Checks for persistence diagrams.
Internal method to verify that birth values are non-negative and less than death values.
- Parameters:
D (numpy.ndarray) – The input diagram to be checked.
- Return type:
None
- tdads.preprocess_diagram(D, inf_replace_val=None, ret=False)[source]
Verify the format of a persistence diagram and convert to a standard format.
This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion).
- Parameters:
D (any) – The persistence diagram to be verified. An exception will be raised if D is not a persistence diagram computed from one of the aforementioned packages.
inf_replace_val (float or int, default None) – The value with which inf values should be replaced, if desired.
ret (bool, default False) – Whether or not to return a processed diagram.
- Returns:
If ret is True and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram.
- Return type:
None or list of numpy.ndarray
- tdads.enclosing_radius(X: numpy.ndarray, distance_mat: bool = False)[source]
Compute the enclosing radius of a dataset. Beyond this filtration radius no topological changes can occur.
- Parameters:
X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.
distance_mat (bool, default False) – Whether or not X is a distance matrix. If False then a Euclidean distance matrix will be computed.
- Returns:
The enclosing radius value of X.
- Return type:
numpy.float64
Examples
>>> from tdads.PH_utils import enclosing_radius >>> from ripser import ripser >>> from numpy.random import uniform >>> from numpy import array, cos, sin >>> from math import pi >>> from scipy.spatial.distance import cdist >>> # build circle dataset >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> # compute the enclosing radius >>> enc_rad = enclosing_radius(data) >>> # compute persistence diagram >>> diagram = ripser(data, enc_rad) >>> # now for a distance matrix >>> dist_data = cdist(data, data, 'euclidean') >>> enc_rad = enclosing_radius(dist_data, True) >>> diagram = ripser(dist_data, enc_rad, distance_matrix = True)
- class tdads.distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
- __str__()[source]
Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (p for Wasserstein and sigma for Fisher information metric).
- compute(D1, D2) float[source]
Compute the distance between two persistence diagrams.
- Parameters:
D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).
D2 (any) – The second persistence diagram (“”).
- Returns:
The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram).
- Return type:
float
Examples
>>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance >>> d_wass.compute(diagram1, diagrams2)
Citations
Kerber M, Morozov D and Nigmetov A (2017). “Geometry Helps to Compare Persistence Diagrams.” https://dl.acm.org/doi/10.1145/3064175.
Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.
Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008.
- compute_matrix(diagrams: list, other_diagrams: list = None)[source]
Compute a distance matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-distance matrix. Default None. :type other_diagrams: any
- Returns:
The (cross) distance matrix.
- Return type:
numpy.ndarray
Examples
>>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance matrix >>> d_wass.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
- class tdads.kernel(dim: int = 0, sigma: float = 1, t: float = 1, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
-
- compute(D1, D2)[source]
Compute the kernel value between two persistence diagrams.
- Parameters:
D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).
D2 (any) – The second persistence diagram.
- Returns:
The numeric kernel calculation value.
- Return type:
float
Examples
>>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute kernel value >>> k.compute(diagram1, diagrams2)
Citations
Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.
- compute_matrix(diagrams, other_diagrams=None)[source]
Compute a Gram (kernel) matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-Gram matrix. Default None. :type other_diagrams: any
- Returns:
The (cross) Gram matrix.
- Return type:
numpy.ndarray
Examples
>>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute Gram matrix >>> k.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> k.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
- class tdads.diagram_mds(n_components: int = 2, random_state: int = None, precomputed: bool = False, dim: int = 0, metric: str = 'W', p: float = 2, sigma: float = None, n_cores: int = cpu_count() - 1)[source]
Multidimensional scaling with persistence diagrams.
- __str__()[source]
Describe a persistence diagram multidimensional scaling object via its distance metric.
- fit_transform(X, y: any = None)[source]
Fit the data in X and compute the position of the persistence diagrams in the embedding space.
- Parameters:
X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed distance matrix of n_diagrams many persistence diagrams (if precomputed was set to True) or a list of n_diagrams many persistence diagrams (otherwise).
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
`X_new` – X transformed in the new space.
- Return type:
ndarray of shape (n_diagrams, n_components)
Examples
>>> from tdads.machine_learning import diagram_mds >>> from tdads.distance import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # project into 2D with the 2-wasserstein distance >>> mds = diagram_mds() >>> mds.fit_transform([D1, D2]) >>> # can also fit with a precomputed distance matrix >>> d_wass = distance() >>> dist_mat = d_wass.compute_matrix([D1, D2]) >>> mds_precomp = diagram_mds(precomputed = True) >>> mds_precomp.fit_transform(dist_mat)
- class tdads.diagram_kpca(n_components: int = 2, random_state: int = None, precomputed: bool = False, diagrams: list = None, dim: int = 0, sigma: float = 1.0, t: float = 1.0, n_cores: int = cpu_count() - 1)[source]
Kernel PCA with persistence diagrams.
- __str__()[source]
Describe a persistence diagram kernel principle components analysis object via its kernel function.
- fit(X, y: any = None)[source]
Fit the model from data in X.
- Parameters:
X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed Gram matrix of n_diagrams many persistence diagrams (if precomputed was set to True) or a list of n_diagrams many persistence diagrams (otherwise).
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
`self` – Returns the instance itself.
- Return type:
object
Examples
>>> from tdads.machine_learning import diagram_mds >>> from tdads.kernel import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # fit model with the persistence Fisher kernel (sigma = t = 1) >>> kpca = diagram_kpca() >>> kpca_fitted = kpca.fit([D1, D2]) >>> # can also fit with a precomputed distance matrix >>> pfk = kernel() >>> gram_mat = pfk.compute_matrix([D1, D2]) >>> kpca_precomp = diagram_kpca(precomputed = True) >>> kpca_precomp_fitted = kpca_precomp.fit(gram_mat)
- transform(X)[source]
Project new persistence diagrams into the embedding space.
- Parameters:
X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed (cross) Gram matrix of shape (n_new_diagrams, n_diagrams) (between the new persistence diagrams and the training set diagrams, if precomputed was set to True) or a list of n_new_diagrams many persistence diagrams (otherwise).
- Returns:
`X_new` – The embedding of the new persistence diagrams.
- Return type:
ndarray
Examples
>>> from tdads.machine_learning import diagram_mds >>> from tdads.kernel import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # fit models (regular and precomputed) with the >>> # persistence Fisher kernel (sigma = t = 1) >>> kpca = diagram_kpca() >>> kpca_fitted = kpca.fit([D1, D2]) # or >>> pfk = kernel() >>> gram_mat = pfk.compute_matrix([D1, D2]) >>> kpca_precomp = diagram_kpca(precomputed = True) >>> kpca_precomp_fitted = kpca_precomp.fit(gram_mat) >>> # create 2 new datasets >>> data3 = np.random((100,2)) >>> data4 = np.random((100,2)) >>> # project new data into 2D space >>> kpca_fitted.transform([D3, D4]) # or >>> cross_gram = pfk.compute_matrix([D1, D2], [D3, D4]) >>> kpca_precomputed_fitted.transform([D3, D4])
- fit_transform(X, y: any = None)[source]
Fit the data in X and compute the position of the persistence diagrams in the embedding space.
- Parameters:
X ({array-like of shape (n_diagrams, n_diagrams)} or {list of length n_diagrams}) – Either a precomputed Gram matrix of n_diagrams many persistence diagrams (if precomputed was set to True) or a list of n_diagrams many persistence diagrams (otherwise).
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
`X_new` – X transformed in the new space.
- Return type:
ndarray
Examples
>>> from tdads.machine_learning import diagram_mds >>> from tdads.kernel import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # fit models (regular and precomputed) with the >>> # persistence Fisher kernel (sigma = t = 1) and >>> # project into 2D space >>> kpca = diagram_kpca() >>> kpca.fit_transform([D1, D2]) # or >>> pfk = kernel() >>> gram_mat = pfk.compute_matrix([D1, D2]) >>> kpca_precomp = diagram_kpca(precomputed = True) >>> kpca_precomp.fit_transform(gram_mat)
- class tdads.distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
- __str__()[source]
Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (p for Wasserstein and sigma for Fisher information metric).
- compute(D1, D2) float[source]
Compute the distance between two persistence diagrams.
- Parameters:
D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).
D2 (any) – The second persistence diagram (“”).
- Returns:
The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram).
- Return type:
float
Examples
>>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance >>> d_wass.compute(diagram1, diagrams2)
Citations
Kerber M, Morozov D and Nigmetov A (2017). “Geometry Helps to Compare Persistence Diagrams.” https://dl.acm.org/doi/10.1145/3064175.
Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.
Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008.
- compute_matrix(diagrams: list, other_diagrams: list = None)[source]
Compute a distance matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-distance matrix. Default None. :type other_diagrams: any
- Returns:
The (cross) distance matrix.
- Return type:
numpy.ndarray
Examples
>>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance matrix >>> d_wass.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
- tdads.check_diagram(D)[source]
Checks for persistence diagrams.
Internal method to verify that birth values are non-negative and less than death values.
- Parameters:
D (numpy.ndarray) – The input diagram to be checked.
- Return type:
None
- tdads.preprocess_diagram(D, inf_replace_val=None, ret=False)[source]
Verify the format of a persistence diagram and convert to a standard format.
This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion).
- Parameters:
D (any) – The persistence diagram to be verified. An exception will be raised if D is not a persistence diagram computed from one of the aforementioned packages.
inf_replace_val (float or int, default None) – The value with which inf values should be replaced, if desired.
ret (bool, default False) – Whether or not to return a processed diagram.
- Returns:
If ret is True and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram.
- Return type:
None or list of numpy.ndarray
- class tdads.kernel(dim: int = 0, sigma: float = 1, t: float = 1, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
-
- compute(D1, D2)[source]
Compute the kernel value between two persistence diagrams.
- Parameters:
D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).
D2 (any) – The second persistence diagram.
- Returns:
The numeric kernel calculation value.
- Return type:
float
Examples
>>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute kernel value >>> k.compute(diagram1, diagrams2)
Citations
Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.
- compute_matrix(diagrams, other_diagrams=None)[source]
Compute a Gram (kernel) matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-Gram matrix. Default None. :type other_diagrams: any
- Returns:
The (cross) Gram matrix.
- Return type:
numpy.ndarray
Examples
>>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute Gram matrix >>> k.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> k.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
- tdads.enclosing_radius(X: numpy.ndarray, distance_mat: bool = False)[source]
Compute the enclosing radius of a dataset. Beyond this filtration radius no topological changes can occur.
- Parameters:
X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.
distance_mat (bool, default False) – Whether or not X is a distance matrix. If False then a Euclidean distance matrix will be computed.
- Returns:
The enclosing radius value of X.
- Return type:
numpy.float64
Examples
>>> from tdads.PH_utils import enclosing_radius >>> from ripser import ripser >>> from numpy.random import uniform >>> from numpy import array, cos, sin >>> from math import pi >>> from scipy.spatial.distance import cdist >>> # build circle dataset >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> # compute the enclosing radius >>> enc_rad = enclosing_radius(data) >>> # compute persistence diagram >>> diagram = ripser(data, enc_rad) >>> # now for a distance matrix >>> dist_data = cdist(data, data, 'euclidean') >>> enc_rad = enclosing_radius(dist_data, True) >>> diagram = ripser(dist_data, enc_rad, distance_matrix = True)
- class tdads.perm_test(iterations: int = 20, dims: list = [0], p: float = 2.0, q: float = 2.0, paired: bool = False, n_cores: int = cpu_count() - 1)[source]
- __str__()[source]
Describe a permutation test procedure based on the number of permutation iterations and whether the groups are paired or unpaired.
- compute_loss(diagram_groups)[source]
Internal method to compute the loss function from Robinson and Turner 2017. This function should not be called directly.
- test(diagram_groups)[source]
Run the permutation test.
- Parameters:
diagram_groups (list of lists) – The groups of persistence diagrams to be analyzed.
- Returns:
Keys are ‘test_statistics’ for the test statistic in each dimension, ‘permvals’ for the null distribution in each dimension and ‘p_values’ for the p-values in each dimension. For example, output[‘p_values’][‘1’] would give the p-value for the second homological dimension in self.dims.
- Return type:
Dict
Examples
>>> # create two groups of persistence diagrams >>> from ripser import ripser >>> import numpy as np >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> D1 = ripser(data1) >>> D2 = ripser(data2) >>> group1 = [D1, D2] >>> group2 = [D1, D2] >>> # create perm test object in dimensions 0 and 1 >>> from tdads.inference import permutation_test >>> pt = permutation_test(dims = [0, 1], n_cores = 2) >>> # run test >>> res = pt.test([g1, g2]) >>> # get p-values >>> res['p_values']
Citations
Robinson T, Turner K (2017). “Hypothesis testing for topological data analysis.” https://link.springer.com/article/10.1007/s41468-017-0008-7.
Abdallah H et al. (2021). “Statistical Inference for Persistent Homology applied to fMRI.” https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf.
- class tdads.diagram_bootstrap(diag_fun, dims: list = [0], num_samples: int = 20, distance_mat: bool = False, alpha: float = 0.05)[source]
- __str__()[source]
Describe a bootstrap procedure based on the number of bootstrap samples, whether or not the input will be a distance matrix and the Type 1 error rate (alpha).
- compute(X: numpy.ndarray, thresh: float)[source]
Carry out the bootstrap procedure.
- Parameters:
X (numpy.ndarray (2D)) – The input dataset - either raw tabular data or a distance matrix of samples.
thresh (float) – The maximum filtration radius for Vietoris-Rips persistent homology.
- Returns:
Entries are ‘diagram’ (the computed persistence diagram), ‘thresholds’ (a Dict of the computed persistence thresholds for each desired dimension) and ‘subsetted_diagram’ (the persistence diagram thresholded by the threshold values in each dimension).
- Return type:
Dict
Examples
>>> from tdads.inference import diagram_bootstrap >>> from ripser import ripser >>> from numpy.random import uniform >>> # build circle dataset >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> # define persistent homology function >>> def diag_fun(X, thresh): >>> return ripser(X = X, thresh = thresh) >>> # create bootstrap object and compute significant features >>> boot = diagram_bootstrap(diag_fun = diag_fun) >>> res = boot.compute(data, 2) >>> # print subsetted diagram >>> res['subsetted_diagram']
Citations
Chazal F et al (2017). “Robust Topological Inference: Distance to a Measure and Kernel Distance.” https://www.jmlr.org/papers/volume18/15-484/15-484.pdf.
- class tdads.distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1)[source]
- __str__()[source]
Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (p for Wasserstein and sigma for Fisher information metric).
- compute(D1, D2) float[source]
Compute the distance between two persistence diagrams.
- Parameters:
D1 (any) – The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages).
D2 (any) – The second persistence diagram (“”).
- Returns:
The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram).
- Return type:
float
Examples
>>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance >>> d_wass.compute(diagram1, diagrams2)
Citations
Kerber M, Morozov D and Nigmetov A (2017). “Geometry Helps to Compare Persistence Diagrams.” https://dl.acm.org/doi/10.1145/3064175.
Le T, Yamada M (2018). “Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams.” https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.
Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008.
- compute_matrix(diagrams: list, other_diagrams: list = None)[source]
Compute a distance matrix between one or two lists of persistence diagrams. :param diagrams: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type diagrams: list :param other_diagrams: The optional second list of persistence diagram for computing a cross-distance matrix. Default None. :type other_diagrams: any
- Returns:
The (cross) distance matrix.
- Return type:
numpy.ndarray
Examples
>>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance matrix >>> d_wass.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2])
- tdads.check_diagram(D)[source]
Checks for persistence diagrams.
Internal method to verify that birth values are non-negative and less than death values.
- Parameters:
D (numpy.ndarray) – The input diagram to be checked.
- Return type:
None
- tdads.preprocess_diagram(D, inf_replace_val=None, ret=False)[source]
Verify the format of a persistence diagram and convert to a standard format.
This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion).
- Parameters:
D (any) – The persistence diagram to be verified. An exception will be raised if D is not a persistence diagram computed from one of the aforementioned packages.
inf_replace_val (float or int, default None) – The value with which inf values should be replaced, if desired.
ret (bool, default False) – Whether or not to return a processed diagram.
- Returns:
If ret is True and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram.
- Return type:
None or list of numpy.ndarray