tdads ===== .. py:module:: tdads Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/tdads/PH_utils/index /autoapi/tdads/diagram_utils/index /autoapi/tdads/distance/index /autoapi/tdads/inference/index /autoapi/tdads/kernel/index /autoapi/tdads/machine_learning/index /autoapi/tdads/tdads/index Classes ------- .. autoapisummary:: tdads.distance tdads.kernel tdads.diagram_mds tdads.diagram_kpca tdads.distance tdads.kernel tdads.perm_test tdads.diagram_bootstrap tdads.distance Functions --------- .. autoapisummary:: tdads.check_diagram tdads.preprocess_diagram tdads.enclosing_radius tdads.check_diagram tdads.preprocess_diagram tdads.enclosing_radius tdads.check_diagram tdads.preprocess_diagram Package Contents ---------------- .. py:function:: check_diagram(D) Checks for persistence diagrams. Internal method to verify that birth values are non-negative and less than death values. :param `D`: The input diagram to be checked. :type `D`: numpy.ndarray :rtype: None .. py:function:: preprocess_diagram(D, inf_replace_val=None, ret=False) Verify the format of a persistence diagram and convert to a standard format. This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion). :param `D`: The persistence diagram to be verified. An exception will be raised if `D` is not a persistence diagram computed from one of the aforementioned packages. :type `D`: any :param `inf_replace_val`: The value with which `inf` values should be replaced, if desired. :type `inf_replace_val`: float or int, default `None` :param `ret`: Whether or not to return a processed diagram. :type `ret`: bool, default `False` :returns: If `ret` is `True` and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram. :rtype: None or list of numpy.ndarray .. py:function:: enclosing_radius(X: numpy.ndarray, distance_mat: bool = False) Compute the enclosing radius of a dataset. Beyond this filtration radius no topological changes can occur. :param `X`: The input dataset - either raw tabular data or a distance matrix of samples. :type `X`: numpy.ndarray (2D) :param `distance_mat`: Whether or not `X` is a distance matrix. If `False` then a Euclidean distance matrix will be computed. :type `distance_mat`: bool, default `False` :returns: The enclosing radius value of `X`. :rtype: numpy.float64 .. rubric:: Examples >>> from tdads.PH_utils import enclosing_radius >>> from ripser import ripser >>> from numpy.random import uniform >>> from numpy import array, cos, sin >>> from math import pi >>> from scipy.spatial.distance import cdist >>> # build circle dataset >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> # compute the enclosing radius >>> enc_rad = enclosing_radius(data) >>> # compute persistence diagram >>> diagram = ripser(data, enc_rad) >>> # now for a distance matrix >>> dist_data = cdist(data, data, 'euclidean') >>> enc_rad = enclosing_radius(dist_data, True) >>> diagram = ripser(dist_data, enc_rad, distance_matrix = True) .. py:class:: distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1) .. py:method:: __str__() Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (`p` for Wasserstein and `sigma` for Fisher information metric). .. py:method:: compute(D1, D2) -> float Compute the distance between two persistence diagrams. :param `D1`: The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `D1`: any :param `D2`: The second persistence diagram (""). :type `D2`: any :returns: The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram). :rtype: float .. rubric:: Examples >>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance >>> d_wass.compute(diagram1, diagrams2) Citations --------- Kerber M, Morozov D and Nigmetov A (2017). "Geometry Helps to Compare Persistence Diagrams." https://dl.acm.org/doi/10.1145/3064175. Le T, Yamada M (2018). "Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams." https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf. Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008. .. py:method:: compute_matrix(diagrams: list, other_diagrams: list = None) Compute a distance matrix between one or two lists of persistence diagrams. :param `diagrams`: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `diagrams`: list :param `other_diagrams`: The optional second list of persistence diagram for computing a cross-distance matrix. Default `None`. :type `other_diagrams`: any :returns: The (cross) distance matrix. :rtype: numpy.ndarray .. rubric:: Examples >>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance matrix >>> d_wass.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2]) .. py:class:: kernel(dim: int = 0, sigma: float = 1, t: float = 1, inf_replace_val: float = None, n_cores: int = cpu_count() - 1) .. py:method:: __str__() Describe a persistence Fisher kernel by its `sigma` and `t` parameters. .. py:method:: compute(D1, D2) Compute the kernel value between two persistence diagrams. :param `D1`: The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `D1`: any :param `D2`: The second persistence diagram. :type `D2`: any :returns: The numeric kernel calculation value. :rtype: float .. rubric:: Examples >>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute kernel value >>> k.compute(diagram1, diagrams2) Citations --------- Le T, Yamada M (2018). "Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams." https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf. .. py:method:: compute_matrix(diagrams, other_diagrams=None) Compute a Gram (kernel) matrix between one or two lists of persistence diagrams. :param `diagrams`: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `diagrams`: list :param `other_diagrams`: The optional second list of persistence diagram for computing a cross-Gram matrix. Default `None`. :type `other_diagrams`: any :returns: The (cross) Gram matrix. :rtype: numpy.ndarray .. rubric:: Examples >>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute Gram matrix >>> k.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> k.compute_matrix([diagram1, diagram2], [diagram1, diagram2]) .. py:class:: diagram_mds(n_components: int = 2, random_state: int = None, precomputed: bool = False, dim: int = 0, metric: str = 'W', p: float = 2, sigma: float = None, n_cores: int = cpu_count() - 1) Multidimensional scaling with persistence diagrams. .. py:method:: __str__() Describe a persistence diagram multidimensional scaling object via its distance metric. .. py:method:: fit_transform(X, y: any = None) Fit the data in X and compute the position of the persistence diagrams in the embedding space. :param `X`: Either a precomputed distance matrix of `n_diagrams` many persistence diagrams (if `precomputed` was set to `True`) or a list of `n_diagrams` many persistence diagrams (otherwise). :type `X`: {array-like of shape `(n_diagrams, n_diagrams)`} or {list of length `n_diagrams`} :param `y`: Not used, present for API consistency by convention. :type `y`: Ignored :returns: **`X_new`** -- `X` transformed in the new space. :rtype: ndarray of shape `(n_diagrams, n_components)` .. rubric:: Examples >>> from tdads.machine_learning import diagram_mds >>> from tdads.distance import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # project into 2D with the 2-wasserstein distance >>> mds = diagram_mds() >>> mds.fit_transform([D1, D2]) >>> # can also fit with a precomputed distance matrix >>> d_wass = distance() >>> dist_mat = d_wass.compute_matrix([D1, D2]) >>> mds_precomp = diagram_mds(precomputed = True) >>> mds_precomp.fit_transform(dist_mat) .. py:class:: diagram_kpca(n_components: int = 2, random_state: int = None, precomputed: bool = False, diagrams: list = None, dim: int = 0, sigma: float = 1.0, t: float = 1.0, n_cores: int = cpu_count() - 1) Kernel PCA with persistence diagrams. .. py:method:: __str__() Describe a persistence diagram kernel principle components analysis object via its kernel function. .. py:method:: fit(X, y: any = None) Fit the model from data in X. :param `X`: Either a precomputed Gram matrix of `n_diagrams` many persistence diagrams (if `precomputed` was set to `True`) or a list of `n_diagrams` many persistence diagrams (otherwise). :type `X`: {array-like of shape `(n_diagrams, n_diagrams)`} or {list of length `n_diagrams`} :param `y`: Not used, present for API consistency by convention. :type `y`: Ignored :returns: **`self`** -- Returns the instance itself. :rtype: object .. rubric:: Examples >>> from tdads.machine_learning import diagram_mds >>> from tdads.kernel import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # fit model with the persistence Fisher kernel (sigma = t = 1) >>> kpca = diagram_kpca() >>> kpca_fitted = kpca.fit([D1, D2]) >>> # can also fit with a precomputed distance matrix >>> pfk = kernel() >>> gram_mat = pfk.compute_matrix([D1, D2]) >>> kpca_precomp = diagram_kpca(precomputed = True) >>> kpca_precomp_fitted = kpca_precomp.fit(gram_mat) .. py:method:: transform(X) Project new persistence diagrams into the embedding space. :param `X`: Either a precomputed (cross) Gram matrix of shape `(n_new_diagrams, n_diagrams)` (between the new persistence diagrams and the training set diagrams, if `precomputed` was set to `True`) or a list of `n_new_diagrams` many persistence diagrams (otherwise). :type `X`: {array-like of shape `(n_diagrams, n_diagrams)`} or {list of length `n_diagrams`} :returns: **`X_new`** -- The embedding of the new persistence diagrams. :rtype: ndarray .. rubric:: Examples >>> from tdads.machine_learning import diagram_mds >>> from tdads.kernel import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # fit models (regular and precomputed) with the >>> # persistence Fisher kernel (sigma = t = 1) >>> kpca = diagram_kpca() >>> kpca_fitted = kpca.fit([D1, D2]) # or >>> pfk = kernel() >>> gram_mat = pfk.compute_matrix([D1, D2]) >>> kpca_precomp = diagram_kpca(precomputed = True) >>> kpca_precomp_fitted = kpca_precomp.fit(gram_mat) >>> # create 2 new datasets >>> data3 = np.random((100,2)) >>> data4 = np.random((100,2)) >>> # project new data into 2D space >>> kpca_fitted.transform([D3, D4]) # or >>> cross_gram = pfk.compute_matrix([D1, D2], [D3, D4]) >>> kpca_precomputed_fitted.transform([D3, D4]) .. py:method:: fit_transform(X, y: any = None) Fit the data in X and compute the position of the persistence diagrams in the embedding space. :param `X`: Either a precomputed Gram matrix of `n_diagrams` many persistence diagrams (if `precomputed` was set to `True`) or a list of `n_diagrams` many persistence diagrams (otherwise). :type `X`: {array-like of shape `(n_diagrams, n_diagrams)`} or {list of length `n_diagrams`} :param `y`: Not used, present for API consistency by convention. :type `y`: Ignored :returns: **`X_new`** -- `X` transformed in the new space. :rtype: ndarray .. rubric:: Examples >>> from tdads.machine_learning import diagram_mds >>> from tdads.kernel import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # fit models (regular and precomputed) with the >>> # persistence Fisher kernel (sigma = t = 1) and >>> # project into 2D space >>> kpca = diagram_kpca() >>> kpca.fit_transform([D1, D2]) # or >>> pfk = kernel() >>> gram_mat = pfk.compute_matrix([D1, D2]) >>> kpca_precomp = diagram_kpca(precomputed = True) >>> kpca_precomp.fit_transform(gram_mat) .. py:class:: distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1) .. py:method:: __str__() Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (`p` for Wasserstein and `sigma` for Fisher information metric). .. py:method:: compute(D1, D2) -> float Compute the distance between two persistence diagrams. :param `D1`: The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `D1`: any :param `D2`: The second persistence diagram (""). :type `D2`: any :returns: The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram). :rtype: float .. rubric:: Examples >>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance >>> d_wass.compute(diagram1, diagrams2) Citations --------- Kerber M, Morozov D and Nigmetov A (2017). "Geometry Helps to Compare Persistence Diagrams." https://dl.acm.org/doi/10.1145/3064175. Le T, Yamada M (2018). "Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams." https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf. Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008. .. py:method:: compute_matrix(diagrams: list, other_diagrams: list = None) Compute a distance matrix between one or two lists of persistence diagrams. :param `diagrams`: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `diagrams`: list :param `other_diagrams`: The optional second list of persistence diagram for computing a cross-distance matrix. Default `None`. :type `other_diagrams`: any :returns: The (cross) distance matrix. :rtype: numpy.ndarray .. rubric:: Examples >>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance matrix >>> d_wass.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2]) .. py:function:: check_diagram(D) Checks for persistence diagrams. Internal method to verify that birth values are non-negative and less than death values. :param `D`: The input diagram to be checked. :type `D`: numpy.ndarray :rtype: None .. py:function:: preprocess_diagram(D, inf_replace_val=None, ret=False) Verify the format of a persistence diagram and convert to a standard format. This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion). :param `D`: The persistence diagram to be verified. An exception will be raised if `D` is not a persistence diagram computed from one of the aforementioned packages. :type `D`: any :param `inf_replace_val`: The value with which `inf` values should be replaced, if desired. :type `inf_replace_val`: float or int, default `None` :param `ret`: Whether or not to return a processed diagram. :type `ret`: bool, default `False` :returns: If `ret` is `True` and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram. :rtype: None or list of numpy.ndarray .. py:class:: kernel(dim: int = 0, sigma: float = 1, t: float = 1, inf_replace_val: float = None, n_cores: int = cpu_count() - 1) .. py:method:: __str__() Describe a persistence Fisher kernel by its `sigma` and `t` parameters. .. py:method:: compute(D1, D2) Compute the kernel value between two persistence diagrams. :param `D1`: The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `D1`: any :param `D2`: The second persistence diagram. :type `D2`: any :returns: The numeric kernel calculation value. :rtype: float .. rubric:: Examples >>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute kernel value >>> k.compute(diagram1, diagrams2) Citations --------- Le T, Yamada M (2018). "Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams." https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf. .. py:method:: compute_matrix(diagrams, other_diagrams=None) Compute a Gram (kernel) matrix between one or two lists of persistence diagrams. :param `diagrams`: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `diagrams`: list :param `other_diagrams`: The optional second list of persistence diagram for computing a cross-Gram matrix. Default `None`. :type `other_diagrams`: any :returns: The (cross) Gram matrix. :rtype: numpy.ndarray .. rubric:: Examples >>> from tdads import kernel >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create kernel object >>> k = kernel() >>> # compute Gram matrix >>> k.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> k.compute_matrix([diagram1, diagram2], [diagram1, diagram2]) .. py:function:: enclosing_radius(X: numpy.ndarray, distance_mat: bool = False) Compute the enclosing radius of a dataset. Beyond this filtration radius no topological changes can occur. :param `X`: The input dataset - either raw tabular data or a distance matrix of samples. :type `X`: numpy.ndarray (2D) :param `distance_mat`: Whether or not `X` is a distance matrix. If `False` then a Euclidean distance matrix will be computed. :type `distance_mat`: bool, default `False` :returns: The enclosing radius value of `X`. :rtype: numpy.float64 .. rubric:: Examples >>> from tdads.PH_utils import enclosing_radius >>> from ripser import ripser >>> from numpy.random import uniform >>> from numpy import array, cos, sin >>> from math import pi >>> from scipy.spatial.distance import cdist >>> # build circle dataset >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> # compute the enclosing radius >>> enc_rad = enclosing_radius(data) >>> # compute persistence diagram >>> diagram = ripser(data, enc_rad) >>> # now for a distance matrix >>> dist_data = cdist(data, data, 'euclidean') >>> enc_rad = enclosing_radius(dist_data, True) >>> diagram = ripser(dist_data, enc_rad, distance_matrix = True) .. py:class:: perm_test(iterations: int = 20, dims: list = [0], p: float = 2.0, q: float = 2.0, paired: bool = False, n_cores: int = cpu_count() - 1) .. py:method:: __str__() Describe a permutation test procedure based on the number of permutation iterations and whether the groups are paired or unpaired. .. py:method:: compute_loss(diagram_groups) Internal method to compute the loss function from Robinson and Turner 2017. This function should not be called directly. .. py:method:: test(diagram_groups) Run the permutation test. :param `diagram_groups`: The groups of persistence diagrams to be analyzed. :type `diagram_groups`: list of lists :returns: Keys are 'test_statistics' for the test statistic in each dimension, 'permvals' for the null distribution in each dimension and 'p_values' for the p-values in each dimension. For example, `output['p_values']['1']` would give the p-value for the second homological dimension in `self.dims`. :rtype: Dict .. rubric:: Examples >>> # create two groups of persistence diagrams >>> from ripser import ripser >>> import numpy as np >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> D1 = ripser(data1) >>> D2 = ripser(data2) >>> group1 = [D1, D2] >>> group2 = [D1, D2] >>> # create perm test object in dimensions 0 and 1 >>> from tdads.inference import permutation_test >>> pt = permutation_test(dims = [0, 1], n_cores = 2) >>> # run test >>> res = pt.test([g1, g2]) >>> # get p-values >>> res['p_values'] Citations --------- Robinson T, Turner K (2017). "Hypothesis testing for topological data analysis." https://link.springer.com/article/10.1007/s41468-017-0008-7. Abdallah H et al. (2021). "Statistical Inference for Persistent Homology applied to fMRI." https://github.com/hassan-abdallah/Statistical_Inference_PH_fMRI/blob/main/Abdallah_et_al_Statistical_Inference_PH_fMRI.pdf. .. py:class:: diagram_bootstrap(diag_fun, dims: list = [0], num_samples: int = 20, distance_mat: bool = False, alpha: float = 0.05) .. py:method:: __str__() Describe a bootstrap procedure based on the number of bootstrap samples, whether or not the input will be a distance matrix and the Type 1 error rate (alpha). .. py:method:: compute(X: numpy.ndarray, thresh: float) Carry out the bootstrap procedure. :param `X`: The input dataset - either raw tabular data or a distance matrix of samples. :type `X`: numpy.ndarray (2D) :param `thresh`: The maximum filtration radius for Vietoris-Rips persistent homology. :type `thresh`: float :returns: Entries are 'diagram' (the computed persistence diagram), 'thresholds' (a Dict of the computed persistence thresholds for each desired dimension) and 'subsetted_diagram' (the persistence diagram thresholded by the threshold values in each dimension). :rtype: Dict .. rubric:: Examples >>> from tdads.inference import diagram_bootstrap >>> from ripser import ripser >>> from numpy.random import uniform >>> # build circle dataset >>> theta = uniform(low = 0, high = 2*pi, size = 100) >>> data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)]) >>> # define persistent homology function >>> def diag_fun(X, thresh): >>> return ripser(X = X, thresh = thresh) >>> # create bootstrap object and compute significant features >>> boot = diagram_bootstrap(diag_fun = diag_fun) >>> res = boot.compute(data, 2) >>> # print subsetted diagram >>> res['subsetted_diagram'] Citations --------- Chazal F et al (2017). "Robust Topological Inference: Distance to a Measure and Kernel Distance." https://www.jmlr.org/papers/volume18/15-484/15-484.pdf. .. py:class:: distance(dim: int = 0, metric='W', p: float = 2.0, sigma: float = None, inf_replace_val: float = None, n_cores: int = cpu_count() - 1) .. py:method:: __str__() Describe a distance metric by type (Wasserstein, bottleneck or Fisher information metric) and major parameter (`p` for Wasserstein and `sigma` for Fisher information metric). .. py:method:: compute(D1, D2) -> float Compute the distance between two persistence diagrams. :param `D1`: The first persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `D1`: any :param `D2`: The second persistence diagram (""). :type `D2`: any :returns: The numeric distance calculation value. In dimension 0 persistence diagrams may contain a point whose death is inf, and these points are ignored (if you wish to use these points make sure to replace inf with the maximum filtration value you used to compute the diagram). :rtype: float .. rubric:: Examples >>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance >>> d_wass.compute(diagram1, diagrams2) Citations --------- Kerber M, Morozov D and Nigmetov A (2017). "Geometry Helps to Compare Persistence Diagrams." https://dl.acm.org/doi/10.1145/3064175. Le T, Yamada M (2018). "Persistence fisher kernel: a riemannian manifold kernel for persistence diagrams." https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf. Vlad I. Morariu, Balaji Vasan Srinivasan, Vikas C. Raykar, Ramani Duraiswami, and Larry S. Davis. Automatic online tuning for fast Gaussian summation. Advances in Neural Information Processing Systems (NIPS), 2008. .. py:method:: compute_matrix(diagrams: list, other_diagrams: list = None) Compute a distance matrix between one or two lists of persistence diagrams. :param `diagrams`: The first first of persistence diagram (computed from either the ripser, gph, flagser, gudhi or cechmate packages). :type `diagrams`: list :param `other_diagrams`: The optional second list of persistence diagram for computing a cross-distance matrix. Default `None`. :type `other_diagrams`: any :returns: The (cross) distance matrix. :rtype: numpy.ndarray .. rubric:: Examples >>> from tdads import distance >>> from ripser import ripser >>> import numpy as np >>> # create 2 datasets >>> data1 = np.random((100,2)) >>> data2 = np.random((100,2)) >>> # compute persistence diagrams with ripser >>> diagram1 = ripser(data1) >>> diagram2 = ripser(data2) >>> # create distance object >>> d_wass = distance() # 2-wasserstein distance >>> # compute distance matrix >>> d_wass.compute_matrix([diagram1, diagram2]) >>> # this is the same as: >>> d_wass.compute_matrix([diagram1, diagram2], [diagram1, diagram2]) .. py:function:: check_diagram(D) Checks for persistence diagrams. Internal method to verify that birth values are non-negative and less than death values. :param `D`: The input diagram to be checked. :type `D`: numpy.ndarray :rtype: None .. py:function:: preprocess_diagram(D, inf_replace_val=None, ret=False) Verify the format of a persistence diagram and convert to a standard format. This function can verify a persistence diagram from the ripser, gph, flagser, gudhi or cechmate packages and convert any such diagram into a list of numpy arrays if desired (largely an internal functionality but can be used in a standalone fashion). :param `D`: The persistence diagram to be verified. An exception will be raised if `D` is not a persistence diagram computed from one of the aforementioned packages. :type `D`: any :param `inf_replace_val`: The value with which `inf` values should be replaced, if desired. :type `inf_replace_val`: float or int, default `None` :param `ret`: Whether or not to return a processed diagram. :type `ret`: bool, default `False` :returns: If `ret` is `True` and the diagram is verified then a list is returned. The i-th element of the returned list is the array of i-dimensional topological features in the diagram. :rtype: None or list of numpy.ndarray