Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
27rabbitlt committed Sep 14, 2024
1 parent c059776 commit f9ba3f6
Showing 1 changed file with 99 additions and 0 deletions.
99 changes: 99 additions & 0 deletions docs/posts/MATH/TDA.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,103 @@ Now we introduce a very famous theorem which has lots of interesting application

To prove this, we need some lemmas. The first one is:

## 3 Persistence

For a simplicial filtration, each time when we add a new $p$-dim simplex, it has to be either a creator or a destructor. A creator means it creates a new $p$-dim cycle (hole) and a destructor means it destroys a $p-1$-dim cycle (hole) since that cycle is the boundary of added simplex.

But we need to notice that, here either a creator or a destructor creates or destroys not only a cycle, but instead a base cycle. So the rank increases or decreases by 1, which means the number different cycles actually doubles or shrinks to a half.

This definition agrees with the definition of betti number and persistence diagram.

It's easy to prove that a new simplex is either a creator or a destructor. So we can pair them in this way: each time we pair a destructor with the youngest still unpaired creator within the cycle it destroys. This algorithm avoids the ambiguity of exactly which cycle is destroyed, when there are two cycles in fact merged into one cycle after adding a destructor.

What if all the creators within the destroyed cycle have been paired? Then for each creator $\rho$, we consider its paired destructor $\tau$, we replace $\rho$ by $\rho + \partial \tau$. Then we get a new set of candidates. Repeat the previous precedure until we find an unpaired creator or if there is no more creator then the added cycle could not be a destructor (WHY?).

Then for each pair of paired simplexes, we draw a point on the persistence diagram, with the coordinate $(f(\rho), f(\tau))$, where $\rho$ is the creator, $\tau$ is the destructor, $f(\cdot)$ is timestamp function. For technical reason, we add infinitely many points on diagonal.

Given persistence diagrams, it's natural to think about how similar two diagrams are or can we define a distance metric for diagrams.

Here we use bottleneck distance, which is defined as:

$$
d_b(Dgm_p(F), Dgm_p(G)) = \inf_{\pi \in \Pi} \sup_{x \in Dgm_p(F)} || x - \pi(x) ||_\infty
$$

We can prove if the diagrams have finite off-diagonal points then bottleneck distance is a metric.

We want to prove the stability of simplicial filtrations.

!!! note "Stability for Simplicial Filtration Theorem"
Let $f, g: K \rightarrow \mathbb{R}$ be simplex-wise monotone functions. Then $d_b(Dgm_p(F_f), Dgm_p(F_g)) \le |f-g|_\infty$.

This theorem states that if we only change a little on the filtration level function $f$, the diagram also changes a little, which implies stability.

The idea behind the proof is that we construct a new function $v(x, t) = t f(x) + (1 - t) g(x)$, then we draw diagrams for each timestamp $t$, which forms a vineyard. The slope of each vine is at most $|| f - g ||_\infty$.

We can also generalize this to any triangulable topological space, which will be proved later using stability with respect to interleaving distance.

### Wasserstein Distance

Bottleneck distance cannot capture the number of "mismatching". Say if there is a relatively deviated point, then no matter how many other points, which are closer to diagonal, are added, the bottleneck distance won't change. We always only look at the longest edge in matching.

To avoid this problem we can use Wasserstein distance.

$$
d_{W, q}(Dgm_p(F), Dgm_p(G)) = \big[ \inf_{\pi \in \Pi} (\sum_{x \in Dgm_p(F)} (||x - \pi(x)||_\infty)^q) \big]^{1/q}
$$

Note that when $q = \infty$, $d_{W, \infty} = d_b$.

We cannot guarantee stability of Wasserstein distance. Counterexamples can be found for simplicial complex and topological spaces. But if we restrict the function to be Lipschitz, we do have stability for Wasserstein distance.

### Interleaving Distance

Interleaving distance of two function levelset induced filtration is at most $||f - g||_infty$.

And using the theorem of stability with respect to interleaving distance, we have $d_b(Dgm(U), Dgm(V)) = d_I (U, V)$. Note that here $U, V$ are homology group based persistence module, but not filtration.

It's also obvious that the interleaving distance of homology goup is no greater than the interleaving distance of corresponding interleaving distance, because we can always construct a map for homology group based on the map for its corresponding simplexes.

So:

$d_b(Dgm(H_p U), Dgm(H_p V)) = d_I (H_p U, H_p V) \le d_I (U, V) \le || f - g ||_\infty$.

A reason why we consider interleaving distance is that previously we only pay attention to stability of one space or simplicial complex with different filtrations induced by levelset of some function. However in real application, we only have point cloud, which may vary in size. Interleaving distance still gives some promise about stability even with different sizes.

Based on this intuition, we consider Hausdoff distance.

## 5 Reeb Graph and Mapper

### Reeb Graph

The idea of Reeb graph is that we extract the 1-dimensional information out of the space $X$ using a function $f: X \rightarrow \mathbb{R}$.


!!! note "Reeb Graph"
$X$ is a topological space, and $f$ is a function from $X$ to $\mathbb{R}$. Two points $x, y \in X$ are called equivalent ($x \sim y$) iff $f(x) = f(y) = \alpha$ and $x, y$ are in the same path-connected component of $f^{-1}(\alpha)$. The Reeb graph $R_f$ is the quotient space $X / \sim$.

To exclude the weird and meaningless cases, we only consider the situations where there are always connected components and homology groups of levelsets only change at finitely many critical values.

Reeb graph by definition is a topological space (it's a quotient space), we call it a graph because it's 1-dimensional. To get a real graph we need a discretization.

It's very natural to consider the number of "neighbours" that a point has. Let $u$ be the number of neighbours in the direction of $f$-value increases; $l$ be the number of neighbours in the direction of decreasing. Then if $u = l = 1$, this point is a regular one; while in the other cases it's a critical point which should be displaced as a distinct vertex in the graph.

### From Topospace to $\mathbb{R}$

Mapper is an approximation of Reeb graph. Instead of preimage of single point, we now consider preimage of intervals.

Again we need a function $f: X \rightarrow \mathbb{R}$, and for an open cover $\{U_\alpha\}$ of $\mathbb{R}$, we consider the path-connected components of preimages of each interval $U_\alpha$ and further compute the nerve of this family, i.e. $f^{-1}(U_\alpha) = \bigcup_{\beta} V_\beta$, let $f^*(U) = \{V_\beta\}$, and finally $N(f^*(U))$.

If we take sufficiently appropriate function $f$ and sufficiently appropriate cover $\{U\}$ then $N(f^*(U))$ is isomorphic to $R_f$.

### Topological Mapper

Previous definition only considers maps to $\mathbb{R}$, now we generalize to arbitary space.

!!! note "Def: well-behaved"
TBD

!!! note "Def: Mapper"
Let $f: X \rightarrow Z$ be well-behaved, and $U$ be a finite open cover of $Z$. Then the Mapper is defined as $M(U,f) = N(f^*(U))$.


0 comments on commit f9ba3f6

Please sign in to comment.