typora-copy-images-to
./img

$$ \def\R{\mathbb R} \def\abs#1{\left |#1\right |} \def\norm#1{\left|\left|#1\right|\right|} $$

CS450

GNU General Public License v3.0 licensed. Source available on github.com/zifeo/EPFL.

Spring 2018: Advanced Algorithms

[TOC]

Greedy & matroid

theory of computation
- computable/decidable : halt
- NL : nondeterministic logarithmic-space
- P : deterministic TM polytime
- NP : solution verified in polytime by deterministic TM or solved by nondeterministic TM in polytime
- NP-complete : hardest problem in NP
- NP-hard : at least as hard as hardest problem of NP, reduced in polytime, maybe not even decidable, not enforced to be elements of NP
- PSPACE : TM in polyspace
- EXPTIME : deterministic TM in exptime
- EXPSPACE: deterministic TM in space
minimum spanning tree
- input : connected undirected graph $G=(V, E)$ with edge-weights $w:E\to\R$
- output spanning tree $T$ of maximum weights $\sum_{e\in T}w(e)$, of $n-1$ edges
greedy algorithm : $\Theta(\abs{E}\log\abs{E})$
- lines 3-4 : almost linear time using UNION-FIND
- correctness
  - downward closed : $(V, Y)$ acyclic implies $(V, X)$ acyclic $\forall X\subseteq Y$
  - $X, Y$ acyclic edge sets with $\abs{Y}>\abs{X}$ implies $\exists e\in Y\setminus X$ s.t. $X+e$ acyclic
  - proof : $S={s_1,\ldots, s_{n-1}}$ decreasing spanning tree returned, suppose contradiction, exist higher weight $T={t_1,\ldots,t_{n-1}}$ decreasing, $p$ smallest index s.t. $w(t_p)>w(s_p)$, must exist $e\in{t_1,\ldots, t_p}\setminus{s_1,\ldots, s_{p-1}}$ s.t. ${s_1,\ldots, s_{p-1},e}$ acyclic, should have added $e$ instead
  - acyclic graph with $k$ edges has $n-k$ components
matroid : $M=(E,J)$ on finite ground set $E$ and family $J$ of subsets called independent sets
- $X\subseteq Y$ and $Y\in J$ implies $X\in J$
- $X, Y \in J$ and $\abs{Y}>\abs{X}$ implies $\exists e\in Y\setminus X$ s.t. $X\cup {e} \in J$
- independent set often exponential size, assume access to independence oracle (whether $E\supseteq I\in J$)
- base : all maximal set same cardinality
- Rado/Gale/Edmonds theorm : greedy find maximum weight base iff matroid
  - $\Leftarrow$ : greedy correctness
  - $\Rightarrow$ : suppose not, violate at least one of two axioms
    - downward-closed : $S\subset T$ s.t. $S\not\in J$ and $T\in J$, consider $w_i=\cases{2~~~i \in S\ 1~~~ i\in T\setminus S\ 0~~~o/w}$, pick first $S_1\subset S$, then at most $T\setminus S$, would return $2\abs{S_1}+\abs{T\setminus S}$ which is less than $w(T)$
    - $S, T\in J$ independent sets s.t. $\abs{S}<\abs{T}$ and $\forall i\in T\setminus S$, $S+i\not\in J$, consider $w_i=\cases{1+\frac{1}{2\abs{S}}~~~ i\in S\ 1\qquad\quad~~ i\in T\setminus S\ 0\qquad\quad~~ o/w}$, pick all in $S$, would return $\abs{S}+1/2$ which is less than $\abs{T}$
- graphic matroid : $E$ edges of graph, $J={F\subseteq E : F \text{ acyclic}}$
- $k$-uniform matroid : $J={X\subseteq E : \abs{X}\le k}$, rank $k$
- partition matroid : $J={X\subseteq E : \abs{E_i\cap X}\le k_i}$ for $i=1,\ldots, l$ disjoint sets
- linear matroid : defined from matrix $A$, $E$ index set of cols, $X\subseteq E$ with $A_X$ matrix of cols indexed by $X$, $J={X\subseteq E : rank(A_X)=\abs{X}}$
- truncated matroid : $M_k=(E,J_k)$ s.t. $J_k={X\in J:\abs{X}\le k}$
- matroid intersection : $M_1=(E,J_1)$ and $M_2=(E,J_2)$ give $M_1\cap M_2=(E,J_1\cap J_2)$, not a matroid, verify cardinality of solution
  - Edmonds/Lawler : efficient algorithm for finding max-weight independent set in intersection two matroids
- laminar matroid : $J={S\subseteq E : \abs{S\cap X}\le k_X~~\forall X\in F}$ with $F$ family of subsets of ground set $E$ either disjoint, subset or supset of each others and ${k_X}_{X\in F}$ positive integers
graph orientation : replace each undirected edge by an arc in either direction
bipartite matching
- matching : $M\subseteq E$ s.t. every vertex incident to at most one edge of $M$, $\abs{{e\in M : v\in e}}\le 1~~\forall v\in V$
- input : bipartite graph $G=(V, E)$
- output : matching of max size
- matroid intersection : vertex bipartition $A$ and $B$, $E=\cup{\delta(v):v\in A}$ with $\delta(v)$ edges incident to $v$, by similarity $M_A=(E, J_A)$ with $J_A={F\subseteq E : \abs{F\cap\delta(v)}\le 1~~\forall v\in A}$
- alternating path : alternate between edges in $M$ and $E\setminus M$
- augmenting path : alternating path in which first and last vertices unmatched
- maximum matching : iff no augmenting path
  - $\Rightarrow$ : contradiction, $P$ augmenting path, $M'=M\Delta P$ greater cardinality
  - $\Leftarrow$ : contradiction, $M^$ maximum matching $\abs{M^}>\abs{M}$, $Q=M\Delta M^$ more edges from $M^$, each vertex incident to at most one edge in $M\cap Q$ and $M^*\cap Q$, must exist augmenting path
colorful spanning trees : intersection of partition matroids
- input : undirected graph
- output : spanning tree with all edges different color
arborescences : intersection of graphic and partition matroids
- input : direct graph $D=(V, A)$, special root vertex $r\in V$
- output : spanning tree from

Linear programming

linear programming : optimize linear objective function for $n$ variables $x_1,\ldots,x_n\in\R$ subject to $m$ linear constraints
- minimize $\sum_{i=1}^n c_i x_i$ s.t. $\sum_i e_{i,j}x_i=b_j$, $\sum_i d_{i,k} x_i\ge g_k$, $\sum_i f_{i,p}x_i\le l_p$
- relaxed : approximation of optimal result for integer programs
simplex method : start from extreme point, move to better neighbors since convex polytope, bad exptime examples
ellipsoid method : binary search for optimal value, start surrounding and then center of ellipsoid, if not found cut ellipsoid in two parts using constraint, polytime but slow in pratice
convex combination of points : $\sum_{i=1}^n\lambda_i x_i$ with $\sum_{i=1}^n \lambda_i = 1$ and $\lambda_i\in [0,1]$
extreme point : if cannot be written as convex combination of other feasible solutions
convex polytope : $P={x\in\R^n : Ax=b, x\ge 0}$
- if bounded : convex polyhedron
optimum : if feasible region bounded, there exists an extreme point optimum
- proof : $x^$ optimal, any feasible point written as convex combination of extreme points $x^{(i)}$, $x^=\sum_i\lambda_i x^{(i)}$, $c$ vector objective, $c^\top x^=c^\top(\sum_i\lambda_i x^{(i)})=\sum_i\lambda_i c^\top x^{(i)}$ implies existence of $x^{(i)}$ s.t. $c^\top x^{(i)}\ge c^\top x^$
maximum weight bipartite perfect matching
- input : undirected graph $G=(V, E)$ partitioned into $A$ and $B$ with edge-weight $w\to\R$
- output : perfect matching $M$ that maximize $w(M)=\sum_{e\in M} w(e)$
- LP : maximize $\sum_{e\in E} x_ew_e$ s.t. $\sum_{e=(a,b)\in E}x_e=1~~\forall a\in A$ and $\sum_{e=(a,b)\in E} x_e=1~~\forall b\in B$ with $x_e\ge 0~~\forall e\in E$
- any extreme point is intergral
  - $G=(V_1,V_2, E)$, suppose contradiction $E_f={e\in E: 0< x_e^* < 1}\not=\emptyset$ must contain cycle $e_1,\ldots,e_{2k}$, let $y_e=\cases{x_e^+\epsilon~~e\in{e_1,\ldots,e_{2k-1}}\ x_e^-\epsilon~~e\in{e_2,\ldots,e_{2k}}\ x_e^\qquad o/w}$ and $z_e=\cases{x_e^-\epsilon~~e\in{e_1,\ldots,e_{2k-1}}\ x_e^+\epsilon~~e\in{e_2,\ldots,e_{2k}}\ x_e^\qquad o/w}$ with $\epsilon =\min{x_e^,(1-x_e^):e\in E_f}$, $x^*=\frac{1}{2}(y+z)$ contradicts extreme point, need to show $y,z$ are feasible solution (with respect to constraint)
$k$-regular graph : degree of each vertex equals $k$
bipartite vertex cover
- input : undirected graph $G=(V, E)$ with node-weight $w:V\to\R$
- output : cover s.t. $e\cap C\not=\emptyset~~\forall e\in E$ that minimizes $w(C)=\sum_{v\in C}w(v)$
- LP : minimize $\sum_{v\in V}x_v w(v)$ s.t. $x_u+x_v\ge 1~~\forall{u, v}\in E$ and $0\le x_v\le 1~~\forall v\in V$
- any extreme point is intergral : $A_f=V_f\cap A$ and $B_f=V_f\cap B$, same arguments on those sets
duality
- primal : minimize $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } $ s.t. $\sum _ { i = 1} ^ { n } A _ { j i } x _ { i } \geq b _ { j } ~~\forall j = 1,\dots ,m ~~ x\ge 0$
- dual : maximize $\sum _ { j = 1} ^ { m } b _ { j } y _ { j }$ s.t. $\sum _ { j = 1} ^ { m } A _ { j i } y _ { j } \leq c _ { i } ~~ \forall i = 1,\dots ,n~~ y\ge 0$
- maximization problem : remplace $x_i$ with $-x_i$ in constraints and objective function
- primal-feasible : $x$ feasible solution to primal
- weak : $x$ primal-feasible and $y$ dual-feasible implies $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } \geq \sum _ { j = 1} ^ { m } b _ { j } y _ { j }$
- strong : $x$ primal optimal solution and $y$ dual optimal solution implies $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } = \sum _ { j = 1} ^ { m } b _ { j } y _ { j }$
- unbounded primal/dual : dual/primal infeasible
- examples : max-flow = min-cut
- complementarity slackness : $x$ primal optimal solution and $y$ dual optimal feasible solution iff $x _ { i } > 0\Rightarrow c _ { i } = \sum _ { j = 1} ^ { m } A _ { j i } y _ { j } \quad \forall i = 1,\dots ,n$ and $y _ { j } > 0\Rightarrow b _ { j } = \sum _ { i = 1} ^ { n } A _ { j i } x _ { i } \quad \forall j = 1,\ldots ,m$
- short notation
  - primal : $\min \left{ c ^ { T } x : A x \geq b ,x \geq 0\right}$
  - dual : $\max \left{ b ^ { T } y : A ^ { T } y \leq c ,y \geq 0\right}$
bipartite maximum cardinality matching
- input : $G=(A\cup B, E)$
- output : matching $M$
- primal : maximize $\sum _ { e \in E } x _ { e }$ s.t. $\sum _ { e = ( a ,b ) \in E } x _ { e } \leq 1\quad \forall a \in A$, $\sum _ { e = ( a ,b ) \in E } x _ { e } \leq 1\quad \forall b \in B$ and $x_e\ge 0$
- dual : vertex cover relaxation $C$, minimize $\sum _ { u \in A \cup B } y _ { v }$ s.t. $y _ { a } + y _ { b } \geq 1\quad \text{ for } ( a ,b ) \in E$ and $y_v\ge 0$
- weak duality : $| M | \leq | C |$
- Konig theorem : both problem integral, $| M ^ { \star } | = | C ^ { \star } |$
minimum cost bipartite perfect matching : continued
- min-cost : minimize $\sum _ { e \in E } c ( e ) x _ { e }$ s.t. $\sum _ { b \in B : ( a ,b ) \in E } x _ { a b } = 1\quad \forall a \in A$ and $\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } = 1\quad \forall b \in B$ with $x _ { e } \geq 0\quad \forall e \in E$
  - solution : inefficient, take dual to get unweighted graph solvable with augmenting paths
  - rewrite : minimize $\sum _ { e \in E } c ( e ) x _ { e }$ s.t $\sum _ { b \in B : ( a ,b ) \in E } x _ { a b } \geq 1\quad \forall a \in A$, $- \sum _ { b \in B : ( a ,b ) \in E } x _ { a b } \geq - 1\quad \forall a \in A$, $\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } \geq 1\quad \forall b \in B$ and $-\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } \geq - 1\quad \forall b \in B$ with $x _ { e } \geq 0\quad \forall e \in E$
  - dual : maximize $\sum _ { a \in A } \left( u _ { a } ^ { + } - u _ { a } ^ { - } \right) + \sum _ { b \in B } \left( v _ { b } ^ { + } - v _ { b } ^ { - } \right)$ s.t. $\left( u _ { a } ^ { + } - u _ { a } ^ { - } \right) + \left( v _ { b } ^ { + } - v _ { b } ^ { - } \right) \leq c ( e ) \quad \forall e = ( a ,b ) \in E$ with $u _ { a } ^ { + } ,u _ { a } ,v _ { b } ^ { + } ,v _ { b } \geq 0\quad \forall a \in A ,b \in B$
  - simplified dual : replace $u _ { a } ^ { + } - u _ { a }^{-}$ by $u_a$ and same for $v$
Hall theorem : $n$-by-$n$ bipartite graph $G=(A\cup B, E')$ has perfect matching iff $| S | \leq | N ( S ) |~~\forall S\subseteq A$ with $N(S)={b\in B : \exists a\in S~~(a,b)\in E'}$ denotes neighborhood of vertices in $S$
hungarian algorithm
- intuition : reduce weighted problem to unweighted one by maintain a dual solution feasible at all times
- minimum cost perfect matching existence : iff feasible dual solution s.t. $u _ { a } + v _ { b } = c ( e )~~\forall e=(a,b)\in M$
- algorithm : $O(n^3)$ not trivial
  - initialization : $v _ { b } = 0\quad u _ { a } = \min _ { b \in B } c _ { a b }$
  - consider $G ^ { \prime } = \left( A \cup B ,E ^ { \prime } \right)$ with set of tight edge $E ^ { \prime } = \left{ e = ( a ,b ) \in E : u _ { a } + v _ { b } = c ( e ) \right}$
  - find maximum cardinality matching in $G'$ : if perfect matching, stop, else find $S\subseteq A$ s.t. $\abs{S}>\abs{N(S)}$
  - choose $\epsilon > 0$ and improve $u _ { a } ^ { \prime } = \left{ \begin{array} { l l } { u _ { a } + \epsilon } & { \text{ if } a \in S } \ { u _ { a } } & { \text{ if } a \notin S } \end{array} \right.$ and $v _ { b } ^ { \prime } = \left{ \begin{array} { l l } { v _ { b } - \epsilon } & { \text{ if } b \in N ( S ) } \ { v _ { b } } & { \text{ if } b \notin N ( S ) } \end{array} \right.$ which remains dual feasible
    - edges in $S\times N(S)$ : unchanged $+\epsilon -\epsilon$
    - edges in $(A\setminus S)\times(B\setminus N(S))$ : unchanged
    - edges in $(A\setminus S)\times N(S)$ : decreased by $\epsilon$, stop being tight
    - edges in $S\times (B\setminus N(S))$: increased by $\epsilon$ but were not tight (not in $E'$)
  - dual value increase : by $(\abs{S} - \abs{N(S)})\epsilon$
  - to get tight edge choose $\epsilon$ as large as possible for which dual remains feasible : $\epsilon =\min _{e = ( a ,b ) \in S \times ( B\setminus N(S) ) } c ( e ) - u _ { a } - v _ { b } > 0$
NP-hard : finding solution likely to be intractable, no algorithm that satisfies following
- efficient : polynomial time
- reliable : works for any input instance
- optimality : find an optimal solution
relax reliability : heuristics design to work well in practice
relax optimality : approximation algorithms
$\alpha$-approximation : runs in polytime and outputs a solution $S$ s.t.
- minimization : $\frac {\operatorname{cost} ( S ) } { \operatorname{cost} ( \text{Optimal solution) } } \leq \alpha$
- maximization : $\frac { p r o f i t ( S ) } { p r o f i t ( \text{Optimal solution) } } \geq \alpha$
- lower bound : because optimum hard to compute, $\frac { \operatorname{cost} ( S ) } { \operatorname{cost} (\text{Optimal solution} ) } \leq \frac { \operatorname{cost} ( S ) } { \text{lower bound on opt } } \leq \alpha$
LP popular framework : given formulation as integer LP (binary $x_i\in{0,1}$), relax to LP ($x_i\in[0,1]$), solve LP as $x^*$ and round solution without losing too much
vertex cover
- input : $G=(V,E)$, vertices weights $w:V\to\R_+$
- output : set $C\subseteq V$ of minimum weights s.t. $u\in C$ or $v\in C$ $\forall {u, v}\in E$
- relax : $x_v\in[0,1]$
- ILP : minimize $\sum _ { v \in V } w ( v ) x _ { v }$ s.t $x _ { u } + x _ { v } \geq 1\quad\forall{u, v}\in E$ with $x_v\in{0,1}$
- $2$-approximation LP : $x_v\in[0,1]$ with rounding $C = \left{ v \in V : x _ { v } ^ { * } \geq \frac { 1} { 2} \right}$ which is feasible
  - weight of $C$ : at most twice the optimal $\sum _ { v \in C } w ( v ) = \sum _ { v \in V : x _ { v }^* \geq \frac { 1} { 2} } w ( v ) \leq \sum _ { v \in V : x _ { v }^* \geq \frac { 1} { 2} } 2x _ { v } ^ { * } w ( v ) = 2\sum _ { v \in V } x _ { v } ^ { * } w ( v ) = 2L P _ { O P T } \leq 2V C _ { O P T }$
integrality gap : $g = \max _ { I \in \mathcal { I } } \frac { O P T ( I ) } { O P T _ { L P } ( I ) }$ with $\mathcal I $ set of all instances of given problem, bound power of linear relaxation
- vertex cover : $n$ vertices, $g \geq \frac { n - 1} { \frac { n } { 2} } = 2- \frac { 2} { n }$
set cover
- input : universe $\mathcal { U } ={e_1,\ldots, e_n}$, family of subsets $\mathcal T={S_1,\ldots, S_m}$ and cost function $c:\mathcal T\to\R_+$
- output : find collection $C$ of subsets of minimum cost that cover all elements
- ILP : minimize $\sum _ { i = 1} ^ { m } x _ { i } \cdot c \left( S _ { i } \right)$ s.t. $\sum _ { S _ { i } : e \in S _ { i }} x _ { i } \geq 1\quad\forall e\in\mathcal U$ with $x_i\in{0,1}$
- LP : $x_i\in[0,1]$ with $C={S_i:x_i^*\ge \frac{1}{f}}$ with each elements belonging to at most $f$ sets, $f$ approximation factor
- randomized rounding : better
  - algorithm
    - solve LP : get optimal solution $x^*$
    - choose positive integer constant $d$ : start with empty result set $C$, repeat $4 d\ln(n)$ times
    - for $i=1,\ldots, m$ : add set $S_i$ to solution $C$ with probability $x_i^*$ choosing independently for each set
  - expected cost of all sets added : $E[\text{rounded cost}]= \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) P r \left[ S _ { i } \text{ is added } \right] = \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) x _ { i } ^ { * } = L P _ { O P T }$
  - expected cost after $d\ln(n)$ executions : at most $d \cdot \ln ( n ) \cdot \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) x ^ { * } \leq d \cdot \ln ( n ) \cdot L P _ { O P T } \leq d \cdot \ln ( n ) \cdot O P T$
  - probability constraint unsatisfied after single execution : at most $\frac{1}{e}$ as $x_1+\cdots+x_k\ge 1$ with $1-x\le e^{-x}$
  - probability of feasible solution : at least $1-\frac{1}{n^{d-1}}$, using union bound
  - output feasible solution of cost at most $4d\ln(n)OPT$ : probability greater than $\frac{1}{2}$, use expected cost and Markov inequality to bound cost $Pr[cost > 4\mu]\le\frac{1}{4}$, bound infeasability by $\frac{1}{n^{d-1}}\le \frac{1}{n}$, we have $1-\frac{1}{4}-\frac{1}{n}$ (if disjoint)

Simplex & multiplicative weight update

weighted majority algorithm : sequencial game between omniscient adverary and aggregator advised by $N$ expert
- algorithm with $\frac{1}{2}$
  - assign each expect $i$ weight $w_i^{(1)}=1$
  - for every $t$ : predict yes/non based on weighted majority
  - for every mistaken expert : $w_i^{(t+1)}=w_i^{(t)}/2$
  - any sequence of duration $T$ and expert $i$ : $#\text{majority mistakes}\le 2.41 (#\text{of i's mistakes} + \log_2 N)$
- potential function : $\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$
- $\epsilon$ update rule : $w_i^{(t+1)}=w_i^{(t)}/(1-\epsilon)$ achieve $#\text{majority mistakes}\leq 2(1+\epsilon)(#\text{ of i's mistakes})+O(\log N/\epsilon)$
  - lowerbound : $\Phi ^ { ( T + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( T + 1) } \geq w _ { i } ^ { ( T + 1) } = ( 1- \epsilon ) ^ {\text{ # of } i \text{'s mistakes }}$
  - upperbound : $\Phi ^ { ( T + 1) } \leq \Phi ( 1) \cdot ( 1- \epsilon / 2) ^ { \text{# of WM mistakes} }$, at last half of weights gets halved
- at worst : twice as many mistakes as best expert
hedge
- setup
  - each expert $i$ gives somes advice
  - allocator picks distribution : $\vec { p } ^ { ( t ) } = \left( p _ { 1} ^ { ( t ) } ,\dots ,p _ { N } ^ { ( t ) } \right)$ over the experts
  - adversary determines cost vector : $\vec { m } ^ { ( t ) } = \left( m _ { 1} ^ { ( t ) } ,\dots ,m _ { N } ^ { ( t ) } \right) \in [ - 1,1] ^ { N }$, negative number means profitable to follow the specific expert
  - allocator suffer : $\vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } = \sum _ { i \in [ N ] } p _ { i } ^ { ( t ) } m _ { i } ^ { ( t ) }$
- algorithm
  - assign each expert $i$ : $w_i^{(1)}=1$
  - pick distribution : $p _ { i } ^ { ( t ) } = w _ { i } ^ { ( t ) } / \Phi ^ { ( t ) }$with $\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$
  - observe cost : $w _ { i } ^ { ( t + 1) } = w _ { i } ^ { ( t ) } \cdot e ^ { - \varepsilon \cdot m _ { i } ^ { ( t ) } }$
- bounds : $\sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } \leq \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } + \frac { \ln N } { \epsilon } + \epsilon T$
- lowerbound : $\Phi ^ { ( T + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( T + 1) } \geq w _ { i } ^ { ( T + 1) } = \exp \left( - \epsilon \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } \right)$
- upperbound : $\Phi ^ { ( t + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( t ) } \cdot \exp \left( - \epsilon m _ { j } ^ { ( t ) } \right)\leq \Phi ^ { ( 1) } \cdot \exp \left( \epsilon ^ { 2} T - \epsilon \sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \vec { m } ^ { ( t ) } \right)$ as $\exp \left( - \epsilon \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } \right)\leq \sum _ { j \in [ N ] } w _ { j } ^ { ( t ) } \left( 1- \epsilon m _ { j } ^ { ( t ) } + \epsilon ^ { 2} \right)\leq \Phi ^ { ( t ) } \cdot \exp \left( \epsilon ^ { 2} - \epsilon \vec { p } ^ { ( t ) } \vec { m } ^ { ( t ) } \right)$ using taylor expansion $e ^ { x } \leq 1+ x + x ^ { 2}$ for $x\in[-1,1]$ and $\left( m _ { j } ^ { ( t ) } \right) ^ { 2} \leq 1$
- generalisation : cost between $[-p,p]^N$ with width $p$
  - if $T \geq \left( 4\rho ^ { 2} \ln N \right) / \epsilon ^ { 2}$ : for any expert $\frac { 1} { T } \sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } \leq \frac { 1} { T } \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } + 2\epsilon$
covering LP : associate expert with each constraint of LP and rewrite $\sum _ { i = 1} ^ { n } d _ { i } x _ { i } \geq b$ the weighted sum of all constraints, assign maximum value to variables that have highest ratio constraint coefficient/objective coefficient
- reduced LP : oracle, minimize $\sum _ { j = 1} ^ { n } c _ { j } x _ { j }$ s.t. $\left( \sum _ { i = 1} ^ { m } p _ { i } A _ { i } \right) \cdot x \geq \sum _ { i = 1} ^ { m } p _ { i } b _ { i }$for $1\geq x \geq 0$,
  - fractional knapsack
    - sort and relabel $x$ s.t. $d_1/c_1\ge\cdots\ge d_n/c_n$
    - let : $k = \min \left{ j : \sum _ { i = 1} ^ { j } d _ { i } \geq b \right}$ and $\ell = b - \sum _ { i = 1} ^ { k - 1} d _ { i }$
    - set : $x_i=1$ for $i=1,\ldots,k-1$, $x_k=\ell/d_k$, $x_i=0$ for $i=k+1,\ldots,n$
- algorithm
  - assign each constraint $i$ : $w _ { i } ^ { ( 1) }=1$
  - pick distribution : $p _ { i } ^ { ( t ) } = w _ { i } ^ { ( t ) } / \Phi ^ { ( t ) }$ with $\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$
  - cost vector
    - solution to reduced LP : $x^{(t)}$ noting $c ^ { T } x ^ { ( t ) }$ at most cost of optimal solution to LP
    - cost constraint $i$ : $m _ { i } ^ { ( t ) } = \sum _ { j = 1} ^ { n } A _ { i j } x _ { j } - b _ { i } = A _ { i } x - b _ { i }$ (positive weighted if constrained satisfied, to descrease)
  - update : $w _ { i } ^ { ( t + 1) } = w _ { i } ^ { ( t ) } \cdot e ^ { - \varepsilon \cdot m _ { i } ^ { ( t ) } }$
  - output : average of constructed solutions $\overline { x } = \frac { 1} { T } \sum _ { t = 1} ^ { T } x ^ { ( t ) }$, almost feasible
- cost : $c ^ { T } \overline { x } = c ^ { T } \left( \frac { 1} { T } \sum _ { t \in [ T ] } x ^ { ( t ) } \right)$ at most cost of optimal solution since each $x^{(t)}$ no higher cost than optimal solution (properties of reduced LP)
- parameters : $T = \left( 4n ^ { 2} \ln m \right) / \epsilon ^ { 2}$ leads $\overline x$ to satisfy $\sum _ { e \in S } x _ { e } \geq 1- 2\epsilon$ and obtain feasible approximately optimal solution with $x ^ { * } = \frac { 1} { 1- 2\epsilon } \overline { x }$
simplex method : fast in practice, at worst exponential
- slack variables : $s_1,\ldots,s_m$ that compensate for equalities, non-negative
- current objective function : $z$
- simplex tableau : all constraints and object function
  - non-zero variables : left handside
- maximize one variable with positive coefficient while respecting constraint : pivot between $s_1\leftrightarrow x _ { 2}$
- each constraint can enforce an upperbound (without having a negative variable), give no upperbound (multiple of the same kind) or say nothing (missing current variable)
- until no increase possible : ignore value of slack variables
- technicalities
  - unboundness : infinite solution
  - degeneracy : might need to pivot two variables to increase one even though current value is zero, avoid to cycle through this swapping by lexicographic ordering
  - initial vertex : phase I to find initial starting point, might require another solver
  - infeasibility : should be detected by phase I

Randomized algorithm

randomized algorithms : in addition to input, take random source of bits and produce output
- correctness probability : $p>0$, run $k$ times lead to $1-(1-p)^k$ (boost)
- approximation : for small $p$ consider $1-p$ as $e^{-p}$
Las Vegas algorithms : randomized that give always correct but polytime in expectation
minimum cut problem
- input : $G = ( V ,E )$ with $n = | V |$
- output : paritition into two subsets joined with minimum number of edges $\min _ { \emptyset \subset S \subset V } | E ( S ,\overline { S } ) |$ with $E ( S ,\overline { S } ) : = { ( u ,v ) : u \in S ,v \notin S }$
- contraction procedure : $(u,v)$ in $G$, merge both endpoints to create new node $uv$, might lead to parallel edges
- hand-shake lemma : in any graph $\sum _ { v \in V } d ( v ) = 2| E |$ with $d(v)$ degrees of $v$
- Krager : $O(n^4\log n)$ by boosting $O(n^2\log n)$ times as raw $O(n^2)$ with success $O(1/n^2)$ to reach $1-\frac{1}{n}$ success
  - size of minimum cut : $| E \left( S ^ { * } ,\overline { S ^ { * } } \right) |=k$
  - uniform random edge in paritition : at most $\mathbb { P } \left[ e \in E \left( S ^ { * } ,\overline { S ^ { * } } \right) \right] = \frac { k } { | E | } \leq \frac { k } { n k / 2} = \frac { 2} { n }$ by hand-shake as $d ( v ) = | E ( { v } ,\overline { { v } } ) | \geq k$
  - idea : choose $n/4$ edges, with probability $1/2$ none of them in min-cut, contract them, new graph will have at most $3n/4$ vertices, recurse, after $O(\log n)$ steps, two super nodes and probability at least $(1/2)^{O(\log n)}$ no edges in min-cut contracted
  - for each contraction, size of minimum cut does not decrease : contract only one edge by on edge to avoid paralel contraction
  - probability of returning min-cut : at least $\frac{2}{n(n-1)}$
    - $A_i$ event that edge picked in step $i$ not in $E \left( S^* ,\overline { S ^ { * } } \right)$
    - by Bayes rules : $\mathbb { P } \left[ A _ { 1} ,\ldots ,A _ { n - 2} \right] = \mathbb { P } \left[ A _ { 1} \right] \mathbb { P } \left[ A _ { 2} | A _ { 1} \right] \mathbb { P } \left[ A _ { 3} | A _ { 1} ,A _ { 2} \right] \ldots \mathbb { P } \left[ A _ { n - 2} | A _ { 1} ,A _ { 2} ,\ldots ,A _ { n - 3} \right]$ as $\mathbb { P } \left[ A _ { i } | A _ { 1} ,\ldots ,A _ { i - 1} \right] \geq 1- \frac { 2} { n - i + 1}$
    - lowerbound : $\mathbb { P } \left[ A _ { 1} ,\dots ,A _ { n - 2} \right] \geq = \frac { 2} { n ( n - 1) } = 1/ \left( \begin{array} { l } { n } \ { 2} \end{array} \right)$
  - collorary : any graph at most $n\choose 2$ min-cuts
- Krager-Stein : $O(n^2\log^3 n)$ by boosting $\log^2n$ times as raw $O(n^2\log n)$ with success $\Omega(1/\log n)$ to reach $1-1/n$ success
  - probability of returning min-cut : at least $1/(2\log n)$
    - probability of not contracting any edge for $n-n/\sqrt{2}$ steps : at least $\frac { r - 2} { r } \cdot \frac { r - 3} { r - 1} \cdots \frac { r / \sqrt { 2} - 2} { r / \sqrt { 2} } \approx \frac { ( r / \sqrt { 2} ) ^ { 2} } { r ^ { 2} } = 1/ 2$
    - show : $\frac { 1} { 2} \left( p + p - p ^ { 2} \right) = p - p ^ { 2} / 2\geq \frac { 1} { 2\log n }$ with $p$ probability of success of any of the two copies, in worst case $p = 1/ ( 2\log ( n / \sqrt { 2} ) )$
master method : $T(n)=aT(\frac{n}{b})+f(n)$
- $f ( n ) = O \left( n ^ { \log _ { b } a - \epsilon } \right)$ : $T ( n ) = \Theta \left( n ^ { \log _ { b } a } \right)$
- $f ( n ) = \Theta \left( n ^ { \log _ { b } a } \right)$ : $T ( n ) = \Theta \left( n ^ { \log _ { b } a } \log n \right)$
- $f ( n ) = \Omega \left( n ^ { \log _ { b } a + \epsilon } \right)$ and $a \cdot f ( n / b ) \leq c \cdot f ( n )$ : $T ( n ) = \Theta ( f ( n ) )$
polynomial identity testing : given polynomial $p$, $q$ defined over common set of variables $x$, determine whether $p(x)-q(x)=0$ for all value with oracle access (no individual term known, but can evaluate at specific input)
- naive : does not handle determinant which cost exponential
- monomial : product of powers of variables with non negative exponents
  - degree : sum of all exponents involved
- polynomial : sum of monomials, each reffered as a term
  - degree : largest degree of each term
- determinant : $\operatorname{det} ( A ) = \sum _ { \sigma : [ n ] \rightarrow [ n ] } \operatorname{sgn} ( \sigma ) \prod _ { i = 1} ^ { n } A _ { i ,\sigma ( i ) }$ with permutation $\sigma$
- Schwartz-Zippdel lemma : $p \left( x _ { 1} ,\dots ,x _ { n } \right)$ nonzero polynomial of $n$ variables with degree $d$, let $S$ finite subset of $\mathbb R$ with at least $d$ elements in it, if we assign $x_1,\ldots,x_n$ values from $S$ independently and uniformly at random $\mathbb { P } \left[ p \left( x _ { 1} ,\ldots ,x _ { n } \right) = 0\right] \leq \frac { d } { | S | }$
  - base case : $n=1$, univariate case as martix identity testing
  - strong induction : assume lemma holds for less than $n$ variables
    - fix $x _ { 1} ,\dots ,x _ { n - 1}$ arbitrarily : $p \left( x _ { n } \right) = a _ { k } x _ { n } ^ { k } + a _ { k - 1} x _ { n } ^ { k - 1} + \ldots + a _ { 1} x _ { n } ^ { 1} + a _ { 0}$ becomes a univariate polynomial of $x_n$ and degree $k\leq d$
    - thus : $\mathbb { P } \left[ p \left( x _ { n } \right) = 0\right] \leq \frac { k } { | S | } \leq \frac { d } { | S | }$ but need to argue that probability unaffected by choice of $x_1,\ldots, x_{n-1}$ which is not the case
    - argue that adverserial scenario rare
matrix identity testing : whether $AB=C$ without matrix multiplication
- randomized : build random vector $x\in\mathbb R^n$ by choosing independently and uniformly from S, $x _ { i } \sim \text{ Uniform } ( S )$, test $ABx=Cx$ to conclude AB=C
  - false positive : $\mathbb { P } _ { x _ { i } \sim S} [ A B x \neq C x ] \geq 1- \frac { 1} { | S | }$ as only $x_j$ value would satisfy $x _ { j } \left( a _ { i j } - c _ { i j } \right) = - \sum _ { k \neq j } \left( a _ { i k } - c _ { i k } \right) x _ { k }$ and as independent other $x$ have no effect $\mathbb { P } _ { x \sim S } \left[ \left\langle a _ { i } ,x \right\rangle = \left\langle c _ { i } ,x \right\rangle \right] \leq \frac { 1} { | S | }$
  - principle of deferred decision : random choices made only when relevant, careful when relaxing fixed assumptions, must hold regardless of their values
  - long division for polynomials : $p ( x ) = d ( x ) q ( x ) + r ( x )$with $p(x)$ degree $d$, divisor $d(x)$ degree $k\le d$, quotient $q(x)$ degree at most $d-k$ and remainder $r(x)$ degree at most $k-1$
  - largest degree $k$ in all monomial division : $p \left( x _ { 1} ,\dots ,x _ { n } \right) = x _ { n } ^ { k } q \left( x _ { 1} ,\dots ,x _ { n - 1} \right) + r \left( x _ { 1} ,\dots ,x _ { n } \right)$
  - reuse principle of deferred decision : $\mathbb { P } _ { x _ { 1} ,\ldots ,x _ { n - 1} \sim S} \left[ q \left( x _ { 1} ,\ldots ,x _ { n - 1} \right) = 0\right] \leq \frac { d - k } { | S | }$ and if $q\not = 0$ then $p(x_1,\ldots, x_n)$ is univariate in $x_n$ and $x_n^k$ nonzero $\mathbb { P } _ { x _ { n } \sim S } [ p = 0| q \neq 0] \leq \frac { k } { | S | }$
  - finish : $\mathbb { P } [ p = 0] = \mathbb { P } [ p = 0| q = 0] \cdot \mathbb { P } [ q = 0] + \mathbb { P } [ p = 0| q \neq 0] \mathbb { P } [ q \neq 0]\leq \mathbb { P } [ q = 0] + \mathbb { P } [ p = 0| q \neq 0]\leq \frac { d - k } { | S | } + \frac { k } { | S | } = \frac { d } { | S | }$
perfect bipartite graph matching existence
- perfect matching : involve every vertex in matching
- deterministic algorithm : $O ( | E | \sqrt { | X | + | Y | } )$
- randomized : assume squared adjacency matrix $A_{ij}=x_ij$ if $x_i$ and $y_j$ connected, existence of perfect matching iff determinant of $A$ different from $0$
  - $\Rightarrow$ : mapping as bijection can be seen as permutation on integer set $\prod _ { i = 1} ^ { n } A _ { i ,f ( i ) } = \prod _ { i = 1} ^ { n } x _ { i ,f ( i ) }$ monomial similar to determinant when $\sigma=f$
  - $\Leftarrow$ : suppose determinant different from $0$, at least one permutation with all terms non zero, which is a bijection
  - correctness probability : $1-1/n$ for $| S | \geq n ^ { 2}$
  - find the matching : choose $| S | \gg n$ with $x _ { i j } = 2^ { w _ { i j } }$ independently and uniformly at random from $S$, with high probability, $\operatorname{det} ( A ) = 2^ { w ( M ) } ( \pm 1+ [ \text{ even } \text{ number } ] )$ with sum of weight of edges of minimum weight perfect matching, for every $(x_i,y_j)$ delete edge and test whether weight decreases to $w(M)-w_{i,j}$ (then the edge belong to the mapping)
general graph matching : exists iff determinant of $A$ nonzero
- skew-symmetric matrix : Tutte matrix $A_{ij}=x_{ij}$ if vertices $v_i$ and $u_j$ connected with edge $i<j$ or $-x_{ij}$ for $i\ge j$ else $0$

Sampling & concentration inequalities

Markov inequality : non-negative, $\mathbb { P } [ X \geq k \cdot \mathbb { E } [ X ] ] \leq \frac { 1} { k }$ or $P(X\ge k)=\frac{E[X]}{k}$ as $\mathbb { E } [ X ] = \sum _ { i } \mathbb { P } [ X = i ] \geq \sum _ { i \geq k } i \cdot \mathbb { P } [ X = i ] \geq \sum _ { i \geq k } k \cdot \mathbb { P } [ X = i ] = k \cdot \mathbb { P } [ X \geq k ]$
- law of large number : $\sqrt { n } \left( \frac { 1} { n } \sum _ { i = 1} ^ { n } X _ { i } - \mu \right) \rightarrow \mathcal { N } ( 0,1)$ as $n\to\infty$
Chebyshev inequality : $\mathbb { P } [ | X - \mathbb { E } X | > \epsilon ] < \frac { \operatorname{Var} ( X ) } { \epsilon ^ { 2} }$ or $\mathbb { P } [ | X - \mathbb { E } [ X ] | > k \sigma ] \leq \frac { 1} { k ^ { 2} } $ as Markov can be applied to variance
- variance : $\operatorname{Var} ( X ) = \mathbb { E } \left[ ( X - \mathbb { E }[ X] ) ^ { 2} \right] =\mathbb { E } \left[ X ^ { 2} \right] - \mathbb { E } [ X ] ^ { 2}$
- 2nd moment at least 1st moment squared : as variance always positive
- standard dev : $\sigma = \sqrt { \operatorname{Var} ( X ) }$
- pairwise independent : $\mathbb { E } \left[ X _ { i } X _ { j } \right] = \mathbb { E } \left[ X _ { i } \right] \mathbb { E } \left[ X _ { j } \right]$ for all $i,j$ and $\operatorname{Var} \left( X _ { 1} + \cdots + X _ { n } \right) = \operatorname{Var} \left( X _ { 1} \right) + \cdots + \operatorname{Var} \left( X _ { n } \right)$
Chernoff inequality : $P(X\ge a)\le e^{-ta}E[e^{tX}]$, avergage of mutually independent random variable
- Bernouilli sum : $X = \sum _ { i = 1} ^ { n } X _ { i }$, $\mu = \mathbb { E } [ X ] = \sum _ { i = 1} ^ { n } p _ { i }$
  - upper tail : $\mathbb { P } [ X \geq ( 1+ \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} } { 2+ \delta } \mu } \text{ for all } \delta > 0$
  - lower tail : $\mathbb { P } [ X \leq ( 1- \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} } { 2} \mu } \text{ for all } \delta > 0$
  - application : $\mathbb { P } \left[ | \frac { \sum X _ { i } } { n } - p | \geq \epsilon \right]\leq \mathbb { P } \left[ \sum X _ { i } \geq ( 1+ \epsilon / p ) p n \right] + \mathbb { P } \left[ \sum X _ { i } \leq ( 1- \epsilon / p ) p n \right] \leq e ^ { - \frac { \epsilon ^ { 2} n } { 3} } + e ^ { - \frac { \epsilon ^ { 2} n } { 2} } $
- bounded sum : $a \leq X _ { i } \leq b \text{ for all } i$, $X = \sum _ { i = 1} ^ { n } X _ { i }$, $\mu = \mathbb { E } [ X ]$
  - upper tail : $\mathbb { P } [ X \geq ( 1+ \delta ) \mu ] \leq e ^ { - \frac { 2\delta ^ { 2} \mu ^ { 2} } { n ( b - a ) ^ { 2} } } \text{ for all } \delta > 0$
  - lower tail : $\mathbb { P } [ X \leq ( 1- \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} \mu ^ { 2} } { n ( b - a ) ^ { 2} } } \text{ for all } \delta > 0$
- proof : $\mathbb { P } \left[ e ^ { s X } \geq e ^ { s a } \right]\leq \frac { \mathbb { E } \left[ e ^ { s X } \right] } { e ^ { s a } }$, independence use to expand $\mathbb { E } \left[ e ^ { s X } \right] = \mathbb { E } \left[ e ^ { s \left( X _ { 1} + X _ { 2} + \cdots + X _ { n } \right) } \right] = \prod _ { i = 1} ^ { n } \mathbb { E } \left[ e ^ { s X _ { i } } \right]$ which can be bounded

Hashing

hashing : $h : U \rightarrow { 1,2,\dots N }$ with $N \ll | U |$ and small subset $S\subset U$, uniform and independent mapping
- table : $T [ h ( x ) ]$, insert, delete, query, with linked list for collisions
- no worst case guarantee : for fixed hash, use randomly $h$ from family
- indicator collision : $I_y$ if $h(y)=h(x)$ else $0$
- collision lenght : $L _ { x } = 1+ \sum _ { y \in S : y \neq x } I _ { y }$ with $\mathbb { E } \left[ L _ { x } \right] = 1+ \frac { | S | - 1} { N }$
- uniform hash function : record need $| \mathcal { U } | \log N$ bits
$2$-universal hash families : $\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = h ( y ) ] \leq \frac { 1} { N }$ for $x\not=y\in U$, can be relaxed to a small constant
- design : choose prime $p \in { | U | ,\ldots ,2| U | }$ with field $f _ { a ,b } ( x ) = a x + b \mod p \quad ( a ,b \in [ p ] ,a \neq 0)$ and $h _ { a ,b } ( x ) = f _ { a ,b } ( x ) \mod N$
- lemma : $a x + b = s \mod p$ and $a y + b = t \mod p$ for $x\not =y$ and $s\not = t$ only have one solution $a = ( x - y ) ^ { - 1} ( s - t )$ and $b = s - a x$ result into $\mathbb P [ f_{a,b}( x ) = s \wedge f_{a,b}( y ) = t ] = \frac { 1} { p ( p - 1) }$ diffrent hash function
  - $2$-universal : $\mathcal { H } = \left{ h _ { a ,b } : a ,b \in [ p ] \wedge a \neq 0\right}$ as $\mathbb { P } \left[ h _ { a ,b } ( x ) = h _ { a ,b } ( y ) \right] = \sum _ { s ,t \in [ p ] : s \neq t } 1 _ { s = t \mod N } \cdot \mathbb { P } \left[ f _ { a ,b } ( x ) = s \wedge f _ { a ,b } ( y ) = t \right]\leq \frac { 1} { p ( p - 1) } \frac { p ( p - 1) } { N }$
  - storage : $O(\log|U|)$ bits
- expected # collision : $\sum _ { x \neq y \in S } \mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = h ( y ) ] \leq \left( \begin{array} { c } { | S | } \ { 2} \end{array} \right) / N$
- two-layer tables : $s_i$ collisons at 1st layer, 2nd layer size $s_i^2$, $\mathbb { E } \left[ \sum _ { i } s _ { i } ^ { 2} \right] = \mathbb { E } \left[ \sum _ { i } s _ { i } \left( s _ { i } - 1\right) \right] + \mathbb { E } \left[ \sum _ { i } s _ { i } \right] = \frac { | S | ( | S | - 1) } { N } + | S | \leq 2| S |$
- table size : adapted as it grows
pairwise independence : $\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = s ,h ( y ) = t ] = \frac { 1} { N ^ { 2} }$for $x\not=y$ and $s\not= t$
- $1$-wise : $\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = s ] = \frac { 1} { N }$
- $k$-wise : $f _ { a _ { 0} ,\dots ,a _ { k - 1} } ( x ) = a _ { k - 1} x ^ { k - 1} + \ldots + a _ { 1} x + a _ { 0}$, from the Vandermonde matrix, stored in $O ( k \log | U | )$
- load balancing
  - balls and bins : $\mathbb { P } [ \text{ bin i gets more than k elements } ] \leq \left( \begin{array} { l } { n } \ { k } \end{array} \right) \cdot \frac { 1} { n ^ { k } } \leq \frac { 1} { k ! }$
  - Stirling formula : $k ! \approx \sqrt { 2n k } \left( \frac { k } { e } \right) ^ { k }$
  - choose : $k = O \left( \frac { \log n } { \log \log n } \right)$ implies $\frac { 1} { k ! } \leq \frac { 1} { n ^ { 2} }$ and $\mathbb { P } [ \exists \text{ bin } \geq k \text{ balls } ] \leq n \cdot \frac { 1} { n ^ { 2} } = \frac { 1} { n }$ giving $\max \operatorname{load} \leq O \left( \frac { \log n } { \log \log n } \right)$ with probability $1-1/n$
- power of two choices : pick two random bins, choose the one with fewer balls, maximal load drops to $O ( \log \log n )$

Streaming algorithms

streaming algorithm
- stream : $\sigma = \left\langle a _ { 1} ,a _ { 2} ,\dots ,a _ { m } \right\rangle$ with value from universe $[ n ] = { 1,\dots ,n }$
- from left to right
- small space : $k$ randomaccess memory bits, want $s = O ( \min { m ,n } )$, holy grail at $s = O ( \log m + \log n )$
- passes : $p=1$ ideally
frequent item deterministrically : $f_1+\cdots+f_n=m$
- majority problem : output $j$ if $f_j>m/2$ exists, but deterministic space $\Omega ( \min { m ,n } )$
- frequent problem with parameter $k$ : output set $\left{ j : f _ { j } > m / k \right}$, but deterministic space $\Omega ( \min { m ,n } )$
- Misra-Gries algorithm : $p=1$, quality $k$
  - algo
    - initialize : empty associate array
    - process : if $j \in k e y s ( A )$, $A [ j ] = A [ j ] + 1$, else if $| k e y s ( A ) | < k - 1$, $A [ j ] = 1$ else foreach $\ell \in k e y s ( A )$, $A [ \ell ] = A [ \ell ] - 1\text{ if } A [ \ell ] = 0$ and remove $0$ counts
    - output : answer questions, report $\hat { f } _ { a } = A [ a ]$
  - space : at most $k-1$ pairs, $O ( k ( \log n + \log m ) )$
  - analysis : $f _ { j } - \frac { m } { k } \leq \hat { f } _ { j } \leq f _ { j }$ as at most $m/k$ decrement, can do a 2nd pass to get desired result
number of distinct elements : $d(\sigma)=| \left{ j : f _ { j } > 0\right} |$, proved impossible deterministically
- guarantee seeked : $\operatorname{Pr} [ d ( \sigma ) / 3\leq A ( \sigma ) \leq 3d ( \sigma ) ] ] \geq 1- \delta$ with space $O ( \log ( 1/ \delta ) \log n )$
- lemma : exists pairwise independent hash family with $h$ sampled by picking and calculated in $O(\log n)$ random bits
- zero function : $\operatorname{zeros} ( p ) = \max \left{ i : 2^ { i } \text{ divides } p \right}$, # zeros in binary representation
- algo
  - initialize : choose $h$ and $z=0$
  - process : $\operatorname{zeros} ( h ( j ) ) > z$ implies $z = \operatorname{zeros} ( h ( j ) )$
  - output : $2^{z+1/2}$
- analysis : $X_{r,j}$ event for "$\operatorname{zeros} ( h ( j ) ) \geq r$", $Y _ { r } = \sum _ { j : f _ { j } > 0} X _ { r ,j }$, $t$ end value
  - restate : $Y _ { r } > 0\Leftrightarrow t \geq r$ into $Y _ { r } = 0\Leftrightarrow t \leq r - 1$
  - uniformly distributed : $\mathbb { E } \left[ X _ { r ,j } \right] = \operatorname{Pr} [ \operatorname{zeros} ( h ( j ) ) \geq r ] = \operatorname{Pr} \left[ 2^ { r } \operatorname{divides} h ( j ) \right] = \frac { 1} { 2^ { r } }$
  - estimate mean and variance : $\mathbb { E } \left[ Y _ { r } \right] = \frac { d } { 2^ { r } }$ and $\text{Var} \left[ Y _ { r } \right] = \sum _ { j : f _ { j } > 0} \left( \mathbb { E } \left[ X _ { r ,j } ^ { 2} \right] - \mathbb { E } \left[ X _ { r ,j } \right] ^ { 2} \right)\leq \sum _ { j : f _ { j } > 0} \mathbb { E } \left[ X _ { r ,j } ^ { 2} \right] = \sum _ { j \cdot f _ { j } > 0} \mathbb { E } \left[ X _ { r ,j } \right] = \frac { d } { 2^ { r } }$
  - inequality : $\operatorname{Pr} \left[ Y _ { r } > 0\right] = \operatorname{Pr} \left[ Y _ { r } \geq 1\right] \leq \frac { \mathbb { E } \left[ Y _ { r } \right] } { 1} = \frac { d } { 2^ { r } }$ and $\operatorname{Pr} \left[ Y _ { r } = 0\right] \leq \operatorname{Pr} \left[ | Y _ { r } - \mathbb { E } \left[ Y _ { r } \right] | \geq \frac { d } { 2^ { r } } \right] \leq \frac { \operatorname{Var} \left[ Y _ { r } \right] } { \left( d / 2^ { r } \right) ^ { 2} } \leq \frac { 2^ { r } } { d }$
  - estimate : $\hat { d } = 2^ { t + 1/ 2}$
  - upper : $a$ smallest integer s.t. $2^ { a + 1/ 2} \geq 3d$, $\operatorname{Pr} [ \hat { d } \geq 3d ] = \operatorname{Pr} [ t \geq a ] = \operatorname{Pr} \left[ Y _ { a } > 0\right] \leq \frac { d } { 2^ { a } } \leq \frac { \sqrt { 2} } { 3}$
  - lower : $b$ largest integer s.t. $2^ { b + 1/ 2} \leq d / 3$, $\operatorname{Pr} [ \hat { d } \leq d / 3] = \operatorname{Pr} [ t \leq b ] = \operatorname{Pr} \left[ Y _ { b + 1} = 0\right] \leq \frac { 2^ { b + 1} } { d } \leq \frac { \sqrt { 2} } { 3}$
  - conclusion : high failure bound, increase approximation or use median trick
- median trick : run $k$ copies in parallel, mutually independent random hash outputting medians
  - chernoff : expect $\le k\frac{\sqrt{2}}{3}$ to exceed $3d$ with probability $\leq 2^{-\Omega(k)}$, choose $k=\Theta(\log(1/\delta))$ with proability at most $\delta$
- space : hashing $O(\log n)$, store $z$ $O(\log\log n)$, thus $O(\log(1/\delta)\log n)$
frequency vector : $\mathbf { f } = \left( f _ { 1} ,\dots ,f _ { n } \right)$ where $f _ { i } = | \left{ j : a _ { j } = i \right} |$ and $f _ { 1} + f _ { 2} + \dots + f _ { m } = m$
- second moment : $F _ { 2} = \sum _ { i = 1} ^ { n } f _ { i } ^ { 2}$
downsampling : to size at most $O ( \sqrt { n } )$ not expected to get a better estimate than $\sqrt { n }$ of the truth
k-wise independend : familiy of hash $H = { h : [ n ] \rightarrow U }$ if for any $k$ distinct $\left( x _ { 1} ,\cdots ,x _ { k } \right) \in U ^ { k }$ and any $\left( u _ { 1} ,\cdots ,u _ { k } \right)$, $P r _ { h \in H } \left[ h \left( x _ { 1} \right) = u _ { 1} \wedge \cdots \wedge h \left( x _ { k } \right) = u _ { k } \right] = \left( \frac { 1} { | U | } \right) ^ { k }$
AMS $\ell_2$ frequency vector norm
- algo
  - init : pick 4-wise independent hash function $h : [ n ] \rightarrow { - 1,+ 1}$, let $\sigma_i=h(i)$ so $\sigma \in { - 1,1} ^ { n }$ and $Z=0$
  - process : element of value $i$ as $Z = Z + \sigma _ { i }$
  - output : $Z^2$ (we have $Z = \sum _ { i = 1} ^ { n } f _ { i } \sigma _ { i }$)
- analysis
  - unbiased : $\mathbb { E } \left[ Z ^ { 2} \right] = \mathbb { E } \left[ \sum _ { i \in [ n ] } \sigma _ { i } ^ { 2} f _ { i } ^ { 2} \right] + \mathbb { E } \left[ \sum _ { i \neq j \in [ n ] } \sigma _ { i } \sigma _ { j } f _ { i } f _ { j } \right]= \mathbb { E } \left[ \sum _ { i \in [ n ] } f _ { i } ^ { 2} \right] + \sum _ { i \neq j \in [ n ] } \mathbb { E } \left[ \sigma _ { i } \right] \mathbb { E } \left[ \sigma _ { j } \right] f _ { i } f _ { j }= | f | _ { 2} ^ { 2}$
  - $Z ^ { 4} = \left( \sum _ { i \in [ n ] } \sigma _ { i } f _ { i } \right) \left( \sum _ { j \in [ n ] } \sigma _ { j } f _ { j } \right) \left( \sum _ { k \in [ n ] } \sigma _ { k } f _ { k } \right) \left( \sum _ { l \in [ n ] } \sigma _ { l } f _ { l } \right)= \sum _ { i \in [ n ] } f _ { i } ^ { 4} + 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}$
    - all indexes equal : $\sum {i \in [ n ]}\sigma i ^4 f _ { i } ^ { 4} = \sum i \in [ n ] f _ { i } ^ { 4}$
    - indexes matched 2 by 2 : $\left( \begin{array} { l } { 4} \ { 2} \end{array} \right) \sum _ { i < j } \left( \sigma _ { i } \sigma _ { j } f _ { i } f _ { j } \right) ^ { 2} = 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}$
    - single unmatched multiplier : 4-wise independent give $0$ as $\mathbb { E } \left[ \sigma _ { i } \right] = 0$ for any $1\le i\leq n$
  - bounded variance : $\operatorname{Var} \left[ Z ^ { 2} \right] = \sum _ { i } f _ { i } ^ { 4} + 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2} - \sum _ { i } f _ { i } ^ { 4} - 2\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2} = 4\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}\leq 2\left( \sum f _ { i } ^ { 2} \right) ^ { 2}= 2| \text{f} | _ { 2} ^ { 4}$
- improve precision : repeat iid $t = \frac { 6} { \epsilon ^ { 2} }$ copies with output $Z _ { 1} ^ { 2} ,Z _ { 2} ^ { 2} \cdots Z _ { t } ^ { 2}$ and average $\tilde { Z } ^ { 2} = \frac { 1} { t } \sum _ { i = 1} ^ { t } Z _ { i } ^ { 2}$
  - same mean by linearity
  - smaller variance : $\frac { \operatorname{Var} \left( Z ^ { 2} \right) } { t } \leq \frac { 2} { t } | \mathbf { f } | | _ { 2} ^ { 4}$, by chebyshev $P r \left[ | \tilde { Z } ^ { 2} - | \mathbf { f } | _ { 2} ^ { 2} | > \epsilon | | \mathbf { f } | | _ { 2} ^ { 2} \right]\leq \frac { \left( \frac { 2} { t } \right) \cdot | | \text{f} | | _ { 2} ^ { 4} } { \epsilon ^ { 2} | f | _ { 2} ^ { 4} }\leq \frac { 2} { t \epsilon ^ { 2} }\leq \frac{1}{3}$

Locality-sensitive hashing & approximate nearest neighborhood search

nearest neighbor search problem : NNS, set of $n$ points $P \subset \mathbb { R } ^ { d }$, find $\min _ { p \in P } \operatorname{dist} ( p ,q )$ given $q \in \mathbb { R } ^ { d }$
- distance function : l2
- sublinear objective : $O(\log n)$ time, $O(n)$ memory
- $k$-d trees : partition space using coordinate-aligned planed, fail to beat naive looping when $d=\Omega(\log n)$
- curve of dimensionality : cube of side length $a$ intersects $2^d$ other cells
- reduction : approximate $\operatorname{dist} ( p ,q ) \leq c \cdot \min _ { s \in P } \operatorname{dist} ( s ,q )$ with $c>1$
  - $ANNS(c,r)$ : build data structure s.t. for any $q$, given $\exists p$ with $\operatorname{dis} t ( p ,q ) \leq r$ returns $\operatorname{dist} \left( p ^ { \prime } ,q \right) \leq c \cdot r$
  - solve $ANNS(c(1-\epsilon),r)$ for $r\in[\delta ,( 1+ \epsilon ) \delta ,( 1+ \epsilon ) ^ { 2} \delta ,\dots ,1]$ with minimum possible distance $\delta$ bit precision $1/\delta$, report minimal value, $O(\log\frac{1}{\delta})$ time and space overhead
- locally sensitive hash LSH : hash function sensitive to distance, high probability to map to same point if $\operatorname{dist} ( p ,q ) \leq r$ and different point if $\operatorname{dist} ( p ,q ) > c \cdot r$
  - family of functions : $\mathcal { H } \text{ is } \left( r ,c \cdot r ,p _ { 1} ,p _ { 2} \right) - \operatorname{L} S H$
  - $\mathcal { H }$ called $\left( r ,c \cdot r ,p _ { 1} ,p _ { 2} \right)$-LSH : if for all $p$, $q$ $\operatorname{dist} ( p ,q ) \leq r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \geq p _ { 1}$ and $\operatorname{dist} ( p ,q ) \geq c \cdot r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \leq p _ { 2}$, ideally $p _ { 1} \gg p _ { 2}$
  - boost : $p_1$ close to $1$ and $p_2$ to $\frac{1}{n}$
  - example : L1 and $h _ { i } ( p ) = p _ { i }$ $i$th bit of $p$, $\mathbb { P } [ h ( p ) = h ( q ) ] = \frac { # \text{ bits in common } } { \text{ total bits } } = \frac { d - | p - q | _ { 1} } { d } = 1- \frac { | p - q | _ { 1} } { d }$ is $\left( c ,c \cdot r ,e ^ { - r / d } ,e ^ { - c \cdot r / d } \right)$-LSH
  - make $p_2$ small : $k$ independent hash function, $\operatorname{dist} ( p ,q ) \geq c \cdot r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \leq p _ { 2} ^ { k }$ for $h ( p ) = \left[ h _ { 1} ( p ) ,\ldots ,h _ { k } ( p ) \right]$, choose $\ell$ independent copies of $h$ $f _ { \ell } ( p ) = \left[ h _ { \ell ,1} ( p ) ,\dots ,h _ { \ell ,k } ( p ) \right]$ s.t. $\mathbb { P } \left[ \exists i | f _ { i } ( p ) = f _ { i } ( q ) \right]\geq 1- \left( 1- p _ { 1} ^ { k } \right) ^ { \ell }$, choose $k$ s.t. $p_2^k=1/n$, choose $\ell \propto n ^ { \rho } \ln n$
    - time overhead : $O(\ell)$ maps to same value as $\mathbb { E } \left[ # p : \operatorname{dist} ( p ,q ) > c \cdot r ,f _ { i } ( p ) = f _ { i } ( q ) \right] \leq n \cdot p _ { 2} ^ { k } \leq 1$
  - make $p_1$ close to $1$ : if $\operatorname{dist} ( p ,q ) \leq r$, $\mathbb { P } \left[ \exists i : f _ { i } ( p ) = f _ { i } ( q ) \right] \geq 1- \left( 1- p _ { 1} ^ { k } \right) ^ { \ell } = 1- \left( 1- p _ { 2} ^ { \rho k } \right) ^ { \ell } = 1- \left( 1- n ^ { - \rho } \right) ^ { \ell } \approx 1- e ^ { \ell n ^ { - \rho } } = 1- 1/ n$
  - assume : $p _ { 1} = p _ { 2} ^ { \rho }$ with $\rho <1$ determines running time/memory
  - algo
  - complexity : memory $O \left( n ^ { 1+ \rho } \right)$ and time $O \left( n ^ { \rho } \right)$
    - space : maintains $O(\ell)$ hash tables, store $n = | P |$ hash value of $k$ dimensional vector, $O ( \ell \cdot n \cdot k ) = O \left( n ^ { 1+ \rho } \ln n \frac { \log n } { \log \left( 1/ p _ { 2} \right) } \right)$
    - time : compute $f_i(q)$, $dist(p,q)$, $O ( d ( \ell + | O | ) + \ell \cdot k ) = O \left( n ^ { \rho } \ln ( n ) \left( d + \frac { \log n } { \log \left( 1/ p _ { 2} \right) } \right) + d \right)$ with $|O|$ number of points at distance $c\cdot r$ from $q$

Submodular function optimization

set function : $f : 2^ { N } \rightarrow \mathbb { R }$ assume real value to every subset $S \subseteq N$
- marginal contribution $u$ to $S$ : $f ( u | S ) = f ( S \cup { u } ) - f ( S )$
- submodular : $f ( A ) + f ( B ) \geq f ( A \cup B ) + f ( A \cap B )$ for all $A ,B \subseteq N$ or iff $f ( u | A ) \geq f ( u | B )$ for all $A \subseteq B \subseteq N$ and each $u \in N \setminus B$ , diminishing returns, discrete derivation non-increasing analog to concavity
  - $\Rightarrow$ : $f ( A \cup { u } ) + f ( B ) \geq f ( A \cup { u } \cup B ) + f \left( \left( A \cup { u } \right) \cap B \right)=f ( { u } \cup B ) + f ( A )$
  - $\Leftarrow$ : $f ( D ) - f ( C \cap D ) = \sum _ { i = 1} ^ { h } f \left( ( C \cap D ) \cup D _ { i } \right) - f \left( ( C \cap D ) \cup D _ { i - 1} \right) \geq \sum _ { i = 1} ^ { h } f \left( d _ { i } | \left( C \cup D _ { i - 1} \right) \right) = f ( C \cup D ) - f ( C )$ with $C ,D \subseteq N$, $D \backslash C = \left{ d _ { 1} ,d _ { 2} ,\dots ,d _ { h } \right}$ and $D _ { i } = \left{ d _ { j } : 1\leq j \leq i \right}$
- examples
  - size of graph cut : $\delta ( S ) = | { ( u ,v ) : u \in S ,v \in V\setminus S } |$ with $ S\subseteq V$, $\delta ( v | S ) = E ( v ,V | S \cup { v } ) ) - E ( v ,S )$
  - union count : $c ( S ) = | \cup _ { i \in S } T _ { i } |$ with collection of sets $T _ { 1} ,T _ { 2} \dots ,T _ { n }$ s.t. $T _ { i } \subseteq \mathbb { N }$, $c ( i | S ) = | T _ { i } \cup \cup _ { j \in S } T _ { j } |- | \cup _ { j \in S } T _ { j } | =| T _ { i } \backslash \cup _ { j \in S } T _ { j } |$
  - maximal independent matroid : $r ( A \cup B ) + r ( A \cap B ) = | D | + | C | \leq | D \cap ( A \cup B ) | + | D \cap ( A \cap B ) | = | D \cap B | + | D \cap A | \leq r ( B ) + r ( A )$
  - text summaries
  - influence maximization
  - entropy : $H(A)=- \sum _ { a } \mathbb { P } \left[ \left{ x _ { i } \right} _ { i \in A } = a \right] \log \left( \mathbb { P } \left[ \left{ x _ { i } \right} _ { i \in A } = a \right] \right)$
- submodular minimization : $f ( S ) = \min _ { S ^ { \prime } \subseteq N } f \left( S ^ { \prime } \right)$
  - ellipsoid method : decide whether $x$ optimal or compute subgradient $\nabla g ( x )$ in polytime subject to poly many inequality
- Lovasz extension : $\hat { f } ( z ) = \mathbb E _ { \lambda \sim [ 0,1] } \left[ f \left( \left{ i : z _ { i } \geq \lambda \right} \right) \right]$ uniformly for every $z \in [ 0,1] ^ { n }$
  - explicit : $z _ { 0} = 1\geq z _ { 1} \geq z _ { 2} \geq \dots \geq z _ { n } \geq 0= z _ { n + 1}$, $S _ { 0} = \emptyset \subseteq S _ { 1} \subseteq S _ { 2} \subseteq \cdots \subseteq S _ { n } = [ n ]$, if $\lambda \in [ z_i ,z_{i + 1})$ (probability equal to $z _ { i } - z _ { i + 1}$) for $i \in [ n ] \cup { 0}$ then $\left{ j | z _ { j } \geq \lambda \right} = S _ { i }$ so $\hat { f } ( z ) = \sum _ { i = 0} ^ { n } \left( z _ { i } - z _ { i + 1} \right) f \left( S _ { i } \right)$
  - assuming constant time evaluation oracle
  - $\hat f$ convex iff $f$ submodular : only "if" part by LP duality
    - assume : $f ( 0) = 0$ or define $f ^ { \prime } = f - f ( 0 )$
    - primal : $\hat f(z)=g ( z ) = \max _ { x } z ^ { T } x$ s.t. $\sum _ { i \in S } x _ { i } \leq f ( S )$ $\forall S \subseteq N$ and $\sum _ { i \in N } x _ { i } = f ( N )$ with $g(z)$ convex
    - maximum of linear function over convex set always convex : $g \left( \lambda z _ { 1} + ( 1- \lambda ) z _ { 2} \right) = \max _ { x } \left( \lambda z _ { 1} ^ { T } x + ( 1- \lambda ) z _ { 2} ^ { T } x \right) \leq \lambda \max _ { x } z _ { 1} ^ { T } x + ( 1- \lambda ) \max _ { x } z _ { 2} ^ { T } x =\lambda g \left( z _ { 1} \right) + ( 1- \lambda ) g \left( z _ { 2} \right)$
    - dual : $\lambda g \left( z _ { 1} \right) + ( 1- \lambda ) g \left( z _ { 2} \right)$ s.t. $\sum _ { S \geq i } y _ { S } = z _ { i }$ $\forall i \in N$ and $y _ { 5} \geq 0$ $\forall S \subset N$
    - weak duality : enforce $z ^ { T } x \leq \sum _ { S \subset N } y _ { S } f ( S )$
    - want to guess optimal solutions : verify optimality as any feasible optimal guess satisfies $z ^ { T } x = \sum _ { S \subset N } y _ { S } f ( S )$
    - assume : $z_0=1\geq z _ { 1} \geq z _ { 2} \geq \ldots \geq z _ { n } \geq 0= z _ { n + 1}$, $S_i={ 1,2,\dots ,i }$
    - define : $x _ { i } ^ { * } = f \left( S _ { i } \right) - f \left( S _ { i - 1} \right)$ and $y _ { S } ^ { * } = \left{ \begin{array} { l l } { z _ { i } - z _ { i + 1} } & { \text{ for } S = S _ { i } \text{ with } i \in [ n ] } \ { 0} & { \text{ otherwise } } \end{array} \right.$
    - claim
      - $z ^ { T } x ^ { * } = \hat { f } ( z )= \sum _ { i = 1} ^ { n } z _ { i } \cdot \left( f \left( S _ { i } \right) - f \left( S _ { i - 1} \right) \right)= \sum _ { i = 0} ^ { n } \left( z _ { i } - z _ { i + 1} \right) f \left( S _ { i } \right) = \sum _ { S \subseteq N } y _ { S } ^ { * } f ( S )$
      - $y^\star$ dual feasible : all non-negative entries $\sum _ { S : S \geq i } y _ { S } ^ { * } = \sum _ { j \geq i } y _ { S _ { j } } ^ { * } = \left( z _ { i } - z _ { i + 1} \right) + \left( z _ { i + 1} - z _ { i + 2} \right) + \cdots + \left( z _ { n - 1} - z _ { n } \right) + \left( z _ { n } - z _ { n + 1} \right) = z _ { i } - z _ { n + 1} = z _ { i }$
      - $x^*$ dual feasible : need $\sum _ { i \in S } x _ { i } ^ { * } \leq f ( S )$ by induction rearrange $f ( S ) + f \left( S _ { i - 1} \right) \geq f \left( S \cup S _ { i - 1} \right) + f ( S \cap S _ { i - 1} ) = f \left( S _ { i } \right) + f ( S \setminus { i } )$ into $f ( S ) \geq f \left( S _ { i } \right) - f \left( S _ { i - 1} \right) + f ( S \backslash { i } ) = x _ { i } ^ { * } + f ( S \backslash { i } ) \geq \sum _ { i \in S } x _ { i } ^ { * }$
  - convex closure : by strong duality, $f ^ { - } : [ 0,1] ^ { n } \rightarrow \mathbb { R }$ s.t if $D ^ { - } ( z ) \text{ for } z \in [ 0,1] ^ { n }$ is distribution on subsets with margin probabilities given by z ($P r _ { S \sim D - ( z )} [ u \in S ] = z _ { u }$) then $f ^ { - } ( z ) = \mathbb { E }_ { S \sim D ^ -{ ( z ) } } [ f ( S ) ] $ which is minised
cardinality constrained monotone maximization
- assume : all functions $f : 2^ { N } \rightarrow \mathbb { R }$ non-negative, monotone $f ( S ) \leq f ( T );\forall S \subseteq T \subseteq N$ and normalized $f ( \emptyset ) = 0$
- constrained : find set $S\subseteq N$ of size at most $k$
- generalize greedy to submodular : choose $u\not\in S$ with maximal marginal gain $f(u\mid S)$, suboptimal
  - if $f ( S ) = \sum _ { e \in S } w _ { e }$ : standard greedy
  - constant factor approximation : $f(S)\ge ( 1- 1/ e ) f ( O) \approx 0.632f ( O)$
    - claim : $\sum_{e \in T \backslash S} f ( e | S ) \geq f ( T \cup S ) - f ( S )$ for $S ,T \subseteq N$ as elements of $T \backslash S$ can be ordered $e _ { 1} ,e _ { 2} ,\dots ,e_{| T \setminus S |} $ and let $T_i$ set containg first $i$ elements of this ordering, we have $T _ { 0} = \emptyset$, $T _ { | T | S | } = T \backslash S$, and $\sum _ { e \in T \backslash S } f ( e | S ) = \sum _ { i = 1} ^ { T \backslash S } f \left( e _ { i } | S \right) \geq \sum _ { i = 1} ^ { T \backslash S } f \left( e _ { i } | S \cup T _ { i - 1} \right)= \sum _ { i = 1} ^ { T \backslash S } f \left( T _ { i } \cup S \right) - f \left( T _ { i - 1} \cup S \right) =\ \sum _ { i = 1} ^ { T \backslash S } f \left( T _ { i } \cup S \right) - f \left( T _ { i - 1} \cup S \right) $
    - proof : since $f$ monotone, assume $| O | = k$ as we can add elements without decreasing value, let $S_i$ be $\left{ u _ { j } : j \leq i \right}$ containing first $i$ elements selected, we have $S _ { 0} = \emptyset$ and $S _ { k } = S$, thus $k \cdot f \left( u _ { i } | S _ { i - 1} \right) \geq | O \setminus S _ { i - 1} | \cdot f \left( u _ { i } | S _ { i - 1} \right) \geq \sum _ { e \in O \setminus S _ { i - 1} } f \left( e | S _ { i - 1} \right) \geq f \left( S _ { i - 1} \cup O \right) - f \left( S _ { i - 1} \right) \geq f ( O ) - f \left( S _ { i - 1} \right)$ which give $f \left( S _ { i } \right) \geq \left( 1- \frac { 1} { k } \right) f \left( S _ { i - 1} \right) + \frac { 1} { k } f ( O )$ (induction) and $\left( 1- \frac { 1} { k } \right) ^ { k } \leq e ^ { - 1}$ since $1+ x \leq e ^ { x }$
- can be generalized to any matroid constraints
unconstrained submodular maximization : greedy from two sides
- $1/3$-approximation
  - claim : for every $1\leq i \leq n $ $a _ { i } + b _ { i } \geq 0$ as $\left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) \cap \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = X _ { i - 1}$ and $\left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) \cup \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = Y _ { i - 1}$ for $X _ { i - 1} \subseteq Y _ { i - 1}$ and $u _ { i } \in Y _ { i - 1}$ thus $a _ { i } + b _ { i } = f \left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) + f \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) - \left[ f \left( X _ { i - 1} \right) + f \left( Y _ { i - 1} \right) \right] \ge 0$
  - claim : $[ f \left( X _ { i } \right) + f \left( Y _ { i } \right) ] - \left[ f \left( X _ { i - 1} \right) + f \left( Y _ { i - 1} \right) \right] \geq f \left( O P T _ { i - 1} \right) - f \left( O P T _ { i } \right)$ for $O P T _ { i } = \left( O P T \cup X _ { i } \right) \cap Y _ { i }$, we have in case $a_1\ge b_i $ $X _ { i } = X _ { i - 1} \cup \left{ u _ { i } \right}$, $Y _ { i } = Y _ { i - 1}$ with $f \left( u _ { i } | X _ { i - 1} \right) = a _ { i } \geq 0$, when $f \left( u _ { i } | X _ { i - 1} \right) = a _ { i } \geq 0$ give $f \left( O P T _ { i } \right) - f \left( O P T _ { i - 1} \right) = f \left( u _ { i } | O P T _ { i - 1} \right) \geq f \left( u _ { i } | ( Y _ { i - i } \setminus \left{ u _ { i } \right} \right) )= f \left( Y _ { i - 1} \right) - f \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = - b _ { i } \geq - a _ { i }$
  - proof : $\sum _ { i = 1} ^ { n } \left[ f \left( O P T _ { i - 1} \right) - f \left( O P T _ { i } \right) \right] \leq \sum _ { i = 1} ^ { n } \left[ f \left( X _ { i } \right) + f \left( Y _ { i } \right) - f \left( X _ { i - 1} \right) - f \left( Y _ { i - 1} \right) \right]$ give $f \left( O P T _ { 0} \right) - f \left( O P T _ { n } \right) \leq f \left( X _ { n } \right) + f \left( Y _ { n } \right) - f \left( X _ { 0} \right) - f \left( Y _ { 0} \right) \leq f \left( X _ { n } \right) + f \left( Y _ { n } \right)$ which results in $f ( O P T ) \leq 3\cdot f \left( X _ { n } \right)$

Online algorithms

difference from streaming algo : online algo need a decision before seen all the stream
competitive ratio : ratio of cost to best cost assuming knowledge of everything in advance $\max _ { I } \frac { A L G ( I ) } { O P T ( I ) }$, for maximization $\min _ { I } \frac { A L G ( I ) } { O P T ( I ) }$
- alternative definition : $A L G ( I ) \leq r \cdot O P T ( I ) + c$ with $c$ constant and $r$ competitive ration, when $c=0$, $r$ called strong competitive ration
rent or buy skis : $R$ or $B$, rent until $B$ reached, then buy, give competitive ration of $2$
monotone submodular function maximizations with cardinality constraint : $poly(k)$ memory independent of the stream length
- examples : document summarization ( $k$ out of many documents), influence maximization (give away $k$ product)
- assuming : know $\text {OPT} = \max _ { S \subseteq N : | S | \leq k } f ( S )$, we have $\mathrm { OPT } / 2 \leq f ( S )$
- proof : ordered elements $e _ { 1 } , e _ { 2 } , \ldots , e _ { n }$, intermediate set $S_i$ and $S=S_n$
  - case $|S|=k$ : select only items with marginal contribution greater, $f ( S ) = f \left( \left{ e _ { i _ { 1 } } , e _ { i _ { 2 } } , \dots , e _ { i _ { k } } \right} \right)\geq k \cdot \frac { \mathrm { OPT } } { 2 k } = \frac { \mathrm { OPT } } { 2 }$
  - case $|S|<k$ : items not selected, $f ( e | S ) < O P T / ( 2 k )$, optimal solution ordering $\left{ o _ { 1 } , o _ { 2 } , \ldots , o _ { k ^ { \prime } } \right} = O \backslash S$, by monotonicity $\mathrm { OPT } \leq f ( S \cup O )=f ( S ) + f ( O | S ) = f ( S ) + \sum _ { i = 1 } ^ { k ^ { \prime } } f \left( \left{ o _ { i } \right} | S \cup \left{ o _ { 1 } , \ldots , o _ { i - 1 } \right} \right)\leq f ( S ) + \sum _ { i = 1 } ^ { k ^ { \prime } } f \left( \left{ o _ { i } \right} | S \right)\leq f ( S ) + k ^ { \prime } \cdot \frac { \mathrm { OPT } } { 2 k } $
caching
- assume : cache miss bring memory page in cache
- deterministic : $k$ cache size
  - least recently used (LRU) : competitive ratio $k$, divide request sequence into phases, phase àià beings at first time $k$-th disinct page seen after phase $i-1$ began, $OPT$ makes at least one cache miss per new phase, LRU at most $k$ per phase
  - first in first out (FIFO) : competitive ratio $k$
  - least frequently used (LFU) : unbounded competitive ratio
  - last in first out (LIFO) : unbounded competitive ratio
  - cannot do better than $k$ deterministic : look at cache content and request another page, this page will be request no sooner that $k$ request
- randomized : marking algo, each page has 1-bit marked when page recently used
  - competitive ratio : $2 H _ { k } = 2 \cdot \left( \frac { 1 } { 1 } + \frac { 1 } { 2 } + \ldots + \frac { 1 } { k } \right) = O ( \log k )$
  - no randomized algo has better than $H_k$
- secretary problem : random arrival order, $n$ unique candidates, hire or not at arrival (1 position only)
  - good strategy : exploring first $n/2$ candidates, take first better than first half, $Pr[\text { hiring best candidate } ] \geq \text { Pr } [ \text { Second best among first } n / 2 ] \cdot \operatorname{Pr} [ \text { best candidate in } 2 \text { nd half } ] \geq ( 1 / 2 ) ( 1 / 2 ) = 1 / 4$
  - bad strategy : hiring first, $\operatorname{Pr} [ \text { hiring best candidate } ] = 1 / n$
  - optimal strategy : start selecting after $r-1$ candidates, $\operatorname{Pr} [ \text { hiring best candidate } ] = \sum _ { i = 1 } ^ { r - 1 } 0 + \frac { 1 } { n } \sum _ { i = r } ^ { n } \mathrm { P } \mathrm { r } \text { [2nd best of the first i applicants is in first } r - 1 | i \text { best } ]= \frac { 1 } { n } \sum _ { i = r } ^ { n } \frac { r - 1 } { i - 1 }$, optimizing $\frac { r - 1 } { n } \cdot \sum _ { i = r } ^ { n } \frac { 1 } { i - 1 } \approx \frac { r } { n } \int _ { r } ^ { n } \frac { 1 } { x } d x = \frac { r } { n } \cdot \ln ( n / r )$ gives $r=n/e$ with prob $1/e$

Spectral graph theory

spectral graph theory : studies how eigenvalues and eigenvectors of adjaceny matrix relate to combinatorial properties
adjency matrix : $G=(V,E)$ of $|V|=n$ is matrix $\mathbb{R}^{n\times n}$ with $A_{ij}=1$ iff ${i,j}\in E$
$d$-regular : degree of each vertex is $d$
eigenvector $v$ with eigenvalue $\lambda$ : $M v = \lambda v$, basis
normalized adjency of $d$-regular : $\frac{1}{d}A$, call random walk matrix
- symmetric : $n$ distinct real eigenvalues $\lambda _ { 1 } \geq \lambda _ { 2 } \geq \dots \geq \lambda _ { n }$, orgonal
- signal $x$ on graph : $y=Mx$ give $y ( i ) = \sum _ { { i , j } \in E } \frac { x ( j ) } { d }$ which is average value according to $x$ of $v$'s neighbours
  - $\lambda_1\le 1$ : $y ( i ) = \sum _ { { i , j } \in E } \frac { x ( j ) } { d } \leq \sum _ { { i , j } \in E } \frac { x ( i ) } { d } = x ( i )$ for $x(i)$ maximized
  - $\lambda _ { 2 } = 1 \Leftrightarrow \text { G is disconnected}$
    - suppose disconnected : define $v _ { 2 } ( i ) = \left{ \begin{array} { l l } { 1 / | S | } & { \text { if } i \in S } \ { - 1 / | V \backslash S | } & { \text { if } i \in V \backslash S } \end{array} \right.$ perpendicular to all $1$'s vector $v_1$, we have $y ( i ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( j ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( i ) = v _ { 2 } ( i )$ hence $M v _ { 2 } = v _ { 2 }$
    - suppose connected : $v _ { 2 } \perp v _ { 1 }$ but at least $v _ { 2 } ( i ) \neq v _ { 2 } ( j )$ hence one $j^\star$ s.t. $v _ { 2 } ( i ) > v _ { 2 } \left( j ^ { * } \right)$ and $y ( i ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( j ) \leq \frac { 1 } { d } \left( \sum _ { { j , i } \in E : j \neq j ^ { * } } v _ { 2 } ( i ) + v _ { 2 } \left( j ^ { * } \right) \right) < v _ { 2 } ( i )$
  - $\lambda _ { n } = - 1 \Leftrightarrow \text { one component of } G \text { is bipartite }$
- number of connected component : $| \left{ i | \lambda _ { i } = 1 \right} |$
mixing time : $M^kp$, connected or never reach uniform distribution, in case of bipartite again same issue
- convergence : $| M ^ { k } p - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } \leq o \left( \frac { 1 } { n ^ { 2 } } \right)$ if $\max \left( | \lambda _ { 2 } | , | \lambda _ { n } | \right) \leq 1 - \epsilon$ after $k = \frac { c } { \epsilon } \log n$ or $O \left( \frac { 1 } { \epsilon } \log n \right)$ steps at any vertex with prob $\approx \frac { 1 } { n }$
- proof : basis $p = \sum _ { i = 1 } ^ { n } \alpha _ { i } v _ { i }$ with $\alpha _ { i } = \left\langle p , v _ { i } \right\rangle$ and $\alpha _ { 1 } = \left\langle p , v _ { 1 } \right\rangle = \frac { 1 } { \sqrt { n } }$ we have $| M ^ { k } x - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } = | \sum _ { i = 2 } ^ { n } \alpha _ { i } \lambda _ { i } ^ { k } v _ { i } | _ { 2 }= \sum _ { i = 2 } ^ { n } | \lambda _ { i } | ^ { k } | \alpha _ { i } v _ { i } | _ { 2 }\leq ( 1 - \epsilon ) ^ { k } \sum _ { i = 2 } ^ { n } | \alpha _ { i } v _ { i } | _ { 2 }= ( 1 - \epsilon ) ^ { k } | | \sum _ { i = 2 } ^ { n } \alpha _ { i } v _ { i } | | _ { 2 }$ since $| \lambda _ { i } | \leq ( 1 - \epsilon )$, take $k = \frac { c } { \epsilon } \log n$ and $A = | \sum _ { i = 2 } ^ { n } \alpha _ { i } v _ { i } | _ { 2 } \leq 1$ give $| M ^ { k } x - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } \leq A ( 1 - \epsilon ) ^ { \frac { \log n } { \epsilon } c } \leq A \left( \frac { 1 } { e } \right) ^ { c \log n } \leq \frac { 1 } { n ^ { c } }$
connectivity : more balanced cut might be better than optimal silly one
conductance
- cut : cut $S\subseteq V$, $h ( S ) = \frac { | \delta ( S ) | } { d \cdot \min { | S | , | V \setminus S | } }$ with set of edges crossing $\delta(S)$, rough measures fraction of edges that leaves $S$
- graph : $h ( G ) = \min _ { S \subset V } h ( S )$
Cheeger's inequalities : $\frac { 1 - \lambda _ { 2 } } { 2 } \leq h ( G ) \leq \sqrt { 2 \left( 1 - \lambda _ { 2 } \right) }$
- algo spectral-partitioning : very fast $O ( | V | \log | V | + | E | )$, $v_2$ can be approximated
- Rayleigh quotient : $\frac { x ^ { T } M x } { x ^ { T } x }$
  - first eigenvalue : $\lambda _ { 1 } = \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$
    - $\lambda _ { 1 } \leq \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$ : $\frac { v _ { 1 } ^ { T } M v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \frac { v _ { 1 } ^ { T } \lambda _ { 1 } v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \lambda _ { 1 } \frac { v _ { 1 } ^ { T } v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \lambda _ { 1 }$
    - $\lambda _ { 1 } \geq \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$ : $\frac { y ^ { T } M y } { y ^ { T } y } = \frac { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } \lambda _ { i } } { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } \leq \lambda _ { 1 } \frac { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } = \lambda _ { 1 }$
  - second eigenvalue : $\lambda _ { 2 } = \max _ { x \in \mathbb { R } ^ { n } : x \perp x _ { 1 } } \frac { x ^ { T } M x } { x ^ { T } \cdot x }$
- proof $\frac { 1 - \lambda _ { 2 } } { 2 } \leq h ( G )$ : $1 - \lambda _ { 2 } = \min _ { x \in \mathbb { R } ^ { n } : x \perp v _ { 1 } } \frac { x ^ { T } ( I - M ) x } { x ^ { T } x }$ give Laplacian with $= \min _ { x \in \mathbb { R } ^ { n } : x \perp v _ { 1 } } \frac { x ^ { T } ( I - M ) x } { x ^ { T } x }$ looks like contiunous relaxation of cut problem, assume $| S | \leq | V | / 2$ and define $y ( i ) = \left{ \begin{array} { l l } { ( 1 - | S | / | V | ) } & { \text { if } i \in S } \ { - | S | / | V | } & { \text { if } i \notin S } \end{array} \right.$ we have $\sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } = \sum _ { { i , j } \in E } ( 1 - | S | / V | + | S | / | V | ) ^ { 2 } = | \delta ( S ) |$ and $y ^ { T } y = \sum _ { i \in V } y ( i ) ^ { 2 } = | S | ( 1 - | S | / | V | ) ^ { 2 } + ( | V | - | S | ) ( | S | / | V | ) ^ { 2 } = \frac { | S | | V | S | } { | V | } \geq | S | / 2 $ so $1-\lambda_2\leq \frac { \sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } } { d \cdot y ^ { T } y }$ and $\frac { 1 - \lambda _ { 2 } } { 2 } \leq \frac { 1 } { 2 } \frac { \sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } } { d \cdot y ^ { T } y } \leq \frac { 1 } { 2 } \frac { | \delta ( S ) | } { d \cdot | S | / 2 } = \frac { | \delta ( S ) | } { d \cdot | S | } = h ( S )$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cs450.md

cs450.md

CS450

Greedy & matroid

Linear programming

Simplex & multiplicative weight update

Randomized algorithm

Sampling & concentration inequalities

Hashing

Streaming algorithms

Locality-sensitive hashing & approximate nearest neighborhood search

Submodular function optimization

Online algorithms

Spectral graph theory

Files

cs450.md

Latest commit

History

cs450.md

File metadata and controls

CS450

Greedy & matroid

Linear programming

Simplex & multiplicative weight update

Randomized algorithm

Sampling & concentration inequalities

Hashing

Streaming algorithms

Locality-sensitive hashing & approximate nearest neighborhood search

Submodular function optimization

Online algorithms

Spectral graph theory