tdads
Data science (ds) for topological data analysis (tda) (i.e. tdads = tda+ds).
Installation
$ pip install tdads
API
tdads has two major modules:
Machine learning. The classes
diagram_mdsanddiagram_kpcaand can be used to project a group of diagrams into a low dimensional space (i.e. dimension reduction).Statistics. The
permutation_testclass can carry out ANOVA-like tests for identifying group differences of persistence diagrams. Theuniversal_nullanddiagram_bootstrapclasses can be used to identify statistically significant topological features in a dataset.
Usage
As an example we will
create 10 persistence diagrams from two distinct groups,
describe the significant topological features in each diagram,
resolve the two groups with MDS, and
capture the group difference using a permutation test.
from tdads.machine_learning import *
from tdads.inference import *
from numpy.random import uniform
from numpy import array
from math import cos, sin, pi
from ripser import ripser
import matplotlib.pyplot as plt
# function to create a circle dataset and
# compute its diagram
def circle_diagram():
# sample 100 points from the unit circle
theta = uniform(low = 0, high = 2*pi, size = 100)
data = array([[cos(theta[i]), sin(theta[i])] for i in range(100)])
# compute persistence diagram
diag = ripser(data, maxdim = 2)
return [data, diag]
# function to create a sphere dataset and
# compute its diagram
def sphere_diagram():
# sample 100 points from the unit sphere
phi = uniform(low = 0, high = 2*pi, size = 100)
theta = uniform(low = 0, high = pi, size = 100)
data = array([[sin(theta[i])*cos(phi[i]), sin(theta[i])*sin(phi[i]), cos(theta[i])] for i in range(100)])
# compute persistence diagram
diag = ripser(data, maxdim = 2)
return [data, diag]
# create 10 diagrams, five from circle datasets and
# five from sphere datasets
result = [circle_diagram(), circle_diagram(), circle_diagram(), circle_diagram(), circle_diagram(),
sphere_diagram(), sphere_diagram(), sphere_diagram(), sphere_diagram(), sphere_diagram()]
data = [r[0] for r in result]
diagrams = [r[1] for r in result]
# use the bootstrap procedure to determine the significant
# topological features in each diagram
def diag_fun(X, thresh):
return ripser(X = X, thresh = thresh, maxdim = 2)
boot = diagram_bootstrap(diag_fun = diag_fun, dims = [0,1,2], alpha = 0.01)
boot_diagrams = [boot.compute(X = d, thresh = 2) for d in data]
# the subsetted diagrams show that only the first five diagrams have
# one loop and only the last five diagrams have one void:
for i in range(10):
print('Num clusters:' + str(len(boot_diagrams[i]['subsetted_diagram'][0])) + ', num loops: ' + str(len(boot_diagrams[i]['subsetted_diagram'][1])) + ', num voids: ' + str(len(boot_diagrams[i]['subsetted_diagram'][2])))
# a 2D MDS projection of the 10 diagrams resolves the two groups:
mds = diagram_mds(dim = 1) # for 1-dimensional homology
emb = mds.fit_transform(diagrams)
plt.scatter(emb[:,0], emb[:,1], color = ['red','red','red','red','red','blue','blue','blue','blue','blue'])
plt.xlabel('Embedding dim 1')
plt.ylabel('Embedding dim 2')
plt.show()
# a permutation test captures the group differences in all dimensions
pt = perm_test(p = float('inf'), iterations = 50, dims = [0,1,2])
res = pt.test([[d for d in diagrams[0:5]], [d for d in diagrams[5:10]]])
res['p_values']
Citation
If you use tdads, please consider citing as:
Brown et al., (2024). TDApplied: An R package for machine learning and inference with persistence diagrams. Journal of Open Source Software, 9(95), 6321, https://doi.org/10.21105/joss.06321
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
tdads was created by Shael Brown. It is licensed under the terms of the GNU General Public License v3.0 license.
Credits
tdads was created with cookiecutter and the py-pkgs-cookiecutter template.