typora-copy-images-to |
---|
./img |
GNU General Public License v3.0 licensed. Source available on github.com/zifeo/EPFL.
Spring 2018: Advanced Algorithms
[TOC]
- theory of computation
- computable/decidable : halt
- NL : nondeterministic logarithmic-space
- P : deterministic TM polytime
- NP : solution verified in polytime by deterministic TM or solved by nondeterministic TM in polytime
- NP-complete : hardest problem in NP
- NP-hard : at least as hard as hardest problem of NP, reduced in polytime, maybe not even decidable, not enforced to be elements of NP
- PSPACE : TM in polyspace
- EXPTIME : deterministic TM in exptime
- EXPSPACE: deterministic TM in space
- minimum spanning tree
- input : connected undirected graph
$G=(V, E)$ with edge-weights$w:E\to\R$ - output spanning tree
$T$ of maximum weights$\sum_{e\in T}w(e)$ , of$n-1$ edges
- input : connected undirected graph
- greedy algorithm :
$\Theta(\abs{E}\log\abs{E})$ - lines 3-4 : almost linear time using UNION-FIND
- correctness
- downward closed :
$(V, Y)$ acyclic implies$(V, X)$ acyclic$\forall X\subseteq Y$ -
$X, Y$ acyclic edge sets with$\abs{Y}>\abs{X}$ implies$\exists e\in Y\setminus X$ s.t.$X+e$ acyclic - proof :
$S={s_1,\ldots, s_{n-1}}$ decreasing spanning tree returned, suppose contradiction, exist higher weight$T={t_1,\ldots,t_{n-1}}$ decreasing,$p$ smallest index s.t.$w(t_p)>w(s_p)$ , must exist$e\in{t_1,\ldots, t_p}\setminus{s_1,\ldots, s_{p-1}}$ s.t.${s_1,\ldots, s_{p-1},e}$ acyclic, should have added$e$ instead - acyclic graph with
$k$ edges has$n-k$ components
- downward closed :
- matroid :
$M=(E,J)$ on finite ground set$E$ and family$J$ of subsets called independent sets-
$X\subseteq Y$ and$Y\in J$ implies$X\in J$ -
$X, Y \in J$ and$\abs{Y}>\abs{X}$ implies$\exists e\in Y\setminus X$ s.t.$X\cup {e} \in J$ - independent set often exponential size, assume access to independence oracle (whether
$E\supseteq I\in J$ ) - base : all maximal set same cardinality
- Rado/Gale/Edmonds theorm : greedy find maximum weight base iff matroid
-
$\Leftarrow$ : greedy correctness -
$\Rightarrow$ : suppose not, violate at least one of two axioms- downward-closed :
$S\subset T$ s.t.$S\not\in J$ and$T\in J$ , consider$w_i=\cases{2~~~i \in S\ 1~~~ i\in T\setminus S\ 0~~~o/w}$ , pick first$S_1\subset S$ , then at most$T\setminus S$ , would return$2\abs{S_1}+\abs{T\setminus S}$ which is less than$w(T)$ -
$S, T\in J$ independent sets s.t.$\abs{S}<\abs{T}$ and$\forall i\in T\setminus S$ ,$S+i\not\in J$ , consider$w_i=\cases{1+\frac{1}{2\abs{S}}~~~ i\in S\ 1\qquad\quad~~ i\in T\setminus S\ 0\qquad\quad~~ o/w}$ , pick all in$S$ , would return$\abs{S}+1/2$ which is less than$\abs{T}$
- downward-closed :
-
- graphic matroid :
$E$ edges of graph,$J={F\subseteq E : F \text{ acyclic}}$ -
$k$ -uniform matroid :$J={X\subseteq E : \abs{X}\le k}$ , rank$k$ - partition matroid :
$J={X\subseteq E : \abs{E_i\cap X}\le k_i}$ for$i=1,\ldots, l$ disjoint sets - linear matroid : defined from matrix
$A$ ,$E$ index set of cols,$X\subseteq E$ with$A_X$ matrix of cols indexed by$X$ ,$J={X\subseteq E : rank(A_X)=\abs{X}}$ - truncated matroid :
$M_k=(E,J_k)$ s.t.$J_k={X\in J:\abs{X}\le k}$ - matroid intersection :
$M_1=(E,J_1)$ and$M_2=(E,J_2)$ give$M_1\cap M_2=(E,J_1\cap J_2)$ , not a matroid, verify cardinality of solution- Edmonds/Lawler : efficient algorithm for finding max-weight independent set in intersection two matroids
- laminar matroid :
$J={S\subseteq E : \abs{S\cap X}\le k_X~~\forall X\in F}$ with$F$ family of subsets of ground set$E$ either disjoint, subset or supset of each others and${k_X}_{X\in F}$ positive integers
-
- graph orientation : replace each undirected edge by an arc in either direction
- bipartite matching
- matching :
$M\subseteq E$ s.t. every vertex incident to at most one edge of$M$ ,$\abs{{e\in M : v\in e}}\le 1~~\forall v\in V$ - input : bipartite graph
$G=(V, E)$ - output : matching of max size
- matroid intersection : vertex bipartition
$A$ and$B$ ,$E=\cup{\delta(v):v\in A}$ with$\delta(v)$ edges incident to$v$ , by similarity$M_A=(E, J_A)$ with$J_A={F\subseteq E : \abs{F\cap\delta(v)}\le 1~~\forall v\in A}$ - alternating path : alternate between edges in
$M$ and$E\setminus M$ - augmenting path : alternating path in which first and last vertices unmatched
- maximum matching : iff no augmenting path
-
$\Rightarrow$ : contradiction,$P$ augmenting path,$M'=M\Delta P$ greater cardinality -
$\Leftarrow$ : contradiction, $M^$ maximum matching $\abs{M^}>\abs{M}$, $Q=M\Delta M^$ more edges from $M^$, each vertex incident to at most one edge in$M\cap Q$ and$M^*\cap Q$ , must exist augmenting path
-
- matching :
- colorful spanning trees : intersection of partition matroids
- input : undirected graph
- output : spanning tree with all edges different color
- arborescences : intersection of graphic and partition matroids
- input : direct graph
$D=(V, A)$ , special root vertex$r\in V$ - output : spanning tree from
- input : direct graph
- linear programming : optimize linear objective function for
$n$ variables$x_1,\ldots,x_n\in\R$ subject to$m$ linear constraints- minimize
$\sum_{i=1}^n c_i x_i$ s.t.$\sum_i e_{i,j}x_i=b_j$ ,$\sum_i d_{i,k} x_i\ge g_k$ ,$\sum_i f_{i,p}x_i\le l_p$ - relaxed : approximation of optimal result for integer programs
- minimize
- simplex method : start from extreme point, move to better neighbors since convex polytope, bad exptime examples
- ellipsoid method : binary search for optimal value, start surrounding and then center of ellipsoid, if not found cut ellipsoid in two parts using constraint, polytime but slow in pratice
- convex combination of points :
$\sum_{i=1}^n\lambda_i x_i$ with$\sum_{i=1}^n \lambda_i = 1$ and$\lambda_i\in [0,1]$ - extreme point : if cannot be written as convex combination of other feasible solutions
- convex polytope :
$P={x\in\R^n : Ax=b, x\ge 0}$ - if bounded : convex polyhedron
- optimum : if feasible region bounded, there exists an extreme point optimum
- proof : $x^$ optimal, any feasible point written as convex combination of extreme points $x^{(i)}$, $x^=\sum_i\lambda_i x^{(i)}$,
$c$ vector objective, $c^\top x^=c^\top(\sum_i\lambda_i x^{(i)})=\sum_i\lambda_i c^\top x^{(i)}$ implies existence of $x^{(i)}$ s.t. $c^\top x^{(i)}\ge c^\top x^$
- proof : $x^$ optimal, any feasible point written as convex combination of extreme points $x^{(i)}$, $x^=\sum_i\lambda_i x^{(i)}$,
- maximum weight bipartite perfect matching
- input : undirected graph
$G=(V, E)$ partitioned into$A$ and$B$ with edge-weight$w\to\R$ - output : perfect matching
$M$ that maximize$w(M)=\sum_{e\in M} w(e)$ - LP : maximize
$\sum_{e\in E} x_ew_e$ s.t.$\sum_{e=(a,b)\in E}x_e=1~~\forall a\in A$ and$\sum_{e=(a,b)\in E} x_e=1~~\forall b\in B$ with$x_e\ge 0~~\forall e\in E$ - any extreme point is intergral
-
$G=(V_1,V_2, E)$ , suppose contradiction$E_f={e\in E: 0< x_e^* < 1}\not=\emptyset$ must contain cycle$e_1,\ldots,e_{2k}$ , let $y_e=\cases{x_e^+\epsilon~~e\in{e_1,\ldots,e_{2k-1}}\ x_e^-\epsilone\in{e_2,\ldots,e_{2k}}\ x_e^\qquad o/w}$ and $z_e=\cases{x_e^-\epsilone\in{e_1,\ldots,e_{2k-1}}\ x_e^+\epsilon~~e\in{e_2,\ldots,e_{2k}}\ x_e^\qquad o/w}$ with $\epsilon =\min{x_e^,(1-x_e^):e\in E_f}$,$x^*=\frac{1}{2}(y+z)$ contradicts extreme point, need to show$y,z$ are feasible solution (with respect to constraint)
-
- input : undirected graph
-
$k$ -regular graph : degree of each vertex equals$k$ - bipartite vertex cover
- input : undirected graph
$G=(V, E)$ with node-weight$w:V\to\R$ - output : cover s.t.
$e\cap C\not=\emptyset~~\forall e\in E$ that minimizes$w(C)=\sum_{v\in C}w(v)$ - LP : minimize
$\sum_{v\in V}x_v w(v)$ s.t.$x_u+x_v\ge 1~~\forall{u, v}\in E$ and$0\le x_v\le 1~~\forall v\in V$ - any extreme point is intergral :
$A_f=V_f\cap A$ and$B_f=V_f\cap B$ , same arguments on those sets
- input : undirected graph
- duality
- primal : minimize $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } $ s.t.
$\sum _ { i = 1} ^ { n } A _ { j i } x _ { i } \geq b _ { j } ~~\forall j = 1,\dots ,m ~~ x\ge 0$ - dual : maximize
$\sum _ { j = 1} ^ { m } b _ { j } y _ { j }$ s.t.$\sum _ { j = 1} ^ { m } A _ { j i } y _ { j } \leq c _ { i } ~~ \forall i = 1,\dots ,n~~ y\ge 0$ - maximization problem : remplace
$x_i$ with$-x_i$ in constraints and objective function - primal-feasible :
$x$ feasible solution to primal - weak :
$x$ primal-feasible and$y$ dual-feasible implies$\sum _ { i = 1} ^ { n } c _ { i } x _ { i } \geq \sum _ { j = 1} ^ { m } b _ { j } y _ { j }$ - strong :
$x$ primal optimal solution and$y$ dual optimal solution implies$\sum _ { i = 1} ^ { n } c _ { i } x _ { i } = \sum _ { j = 1} ^ { m } b _ { j } y _ { j }$ - unbounded primal/dual : dual/primal infeasible
- examples : max-flow = min-cut
- complementarity slackness :
$x$ primal optimal solution and$y$ dual optimal feasible solution iff$x _ { i } > 0\Rightarrow c _ { i } = \sum _ { j = 1} ^ { m } A _ { j i } y _ { j } \quad \forall i = 1,\dots ,n$ and$y _ { j } > 0\Rightarrow b _ { j } = \sum _ { i = 1} ^ { n } A _ { j i } x _ { i } \quad \forall j = 1,\ldots ,m$ - short notation
- primal :
$\min \left{ c ^ { T } x : A x \geq b ,x \geq 0\right}$ - dual :
$\max \left{ b ^ { T } y : A ^ { T } y \leq c ,y \geq 0\right}$
- primal :
- primal : minimize $\sum _ { i = 1} ^ { n } c _ { i } x _ { i } $ s.t.
- bipartite maximum cardinality matching
- input :
$G=(A\cup B, E)$ - output : matching
$M$ - primal : maximize
$\sum _ { e \in E } x _ { e }$ s.t.$\sum _ { e = ( a ,b ) \in E } x _ { e } \leq 1\quad \forall a \in A$ ,$\sum _ { e = ( a ,b ) \in E } x _ { e } \leq 1\quad \forall b \in B$ and$x_e\ge 0$ - dual : vertex cover relaxation
$C$ , minimize$\sum _ { u \in A \cup B } y _ { v }$ s.t.$y _ { a } + y _ { b } \geq 1\quad \text{ for } ( a ,b ) \in E$ and$y_v\ge 0$ - weak duality :
$| M | \leq | C |$ - Konig theorem : both problem integral,
$| M ^ { \star } | = | C ^ { \star } |$
- input :
- minimum cost bipartite perfect matching : continued
- min-cost : minimize
$\sum _ { e \in E } c ( e ) x _ { e }$ s.t.$\sum _ { b \in B : ( a ,b ) \in E } x _ { a b } = 1\quad \forall a \in A$ and$\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } = 1\quad \forall b \in B$ with$x _ { e } \geq 0\quad \forall e \in E$ - solution : inefficient, take dual to get unweighted graph solvable with augmenting paths
- rewrite : minimize
$\sum _ { e \in E } c ( e ) x _ { e }$ s.t$\sum _ { b \in B : ( a ,b ) \in E } x _ { a b } \geq 1\quad \forall a \in A$ ,$- \sum _ { b \in B : ( a ,b ) \in E } x _ { a b } \geq - 1\quad \forall a \in A$ ,$\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } \geq 1\quad \forall b \in B$ and$-\sum _ { a \in A : ( a ,b ) \in E } x _ { a b } \geq - 1\quad \forall b \in B$ with$x _ { e } \geq 0\quad \forall e \in E$ - dual : maximize
$\sum _ { a \in A } \left( u _ { a } ^ { + } - u _ { a } ^ { - } \right) + \sum _ { b \in B } \left( v _ { b } ^ { + } - v _ { b } ^ { - } \right)$ s.t.$\left( u _ { a } ^ { + } - u _ { a } ^ { - } \right) + \left( v _ { b } ^ { + } - v _ { b } ^ { - } \right) \leq c ( e ) \quad \forall e = ( a ,b ) \in E$ with$u _ { a } ^ { + } ,u _ { a } ,v _ { b } ^ { + } ,v _ { b } \geq 0\quad \forall a \in A ,b \in B$ - simplified dual : replace
$u _ { a } ^ { + } - u _ { a }^{-}$ by$u_a$ and same for$v$
- min-cost : minimize
- Hall theorem :
$n$ -by-$n$ bipartite graph$G=(A\cup B, E')$ has perfect matching iff $| S | \leq | N ( S ) |\forall S\subseteq A$ with $N(S)={b\in B : \exists a\in S(a,b)\in E'}$ denotes neighborhood of vertices in$S$ - hungarian algorithm
- intuition : reduce weighted problem to unweighted one by maintain a dual solution feasible at all times
- minimum cost perfect matching existence : iff feasible dual solution s.t.
$u _ { a } + v _ { b } = c ( e )~~\forall e=(a,b)\in M$ - algorithm :
$O(n^3)$ not trivial- initialization :
$v _ { b } = 0\quad u _ { a } = \min _ { b \in B } c _ { a b }$ - consider
$G ^ { \prime } = \left( A \cup B ,E ^ { \prime } \right)$ with set of tight edge$E ^ { \prime } = \left{ e = ( a ,b ) \in E : u _ { a } + v _ { b } = c ( e ) \right}$ - find maximum cardinality matching in
$G'$ : if perfect matching, stop, else find$S\subseteq A$ s.t.$\abs{S}>\abs{N(S)}$ - choose
$\epsilon > 0$ and improve $u _ { a } ^ { \prime } = \left{ \begin{array} { l l } { u _ { a } + \epsilon } & { \text{ if } a \in S } \ { u _ { a } } & { \text{ if } a \notin S } \end{array} \right.$ and $v _ { b } ^ { \prime } = \left{ \begin{array} { l l } { v _ { b } - \epsilon } & { \text{ if } b \in N ( S ) } \ { v _ { b } } & { \text{ if } b \notin N ( S ) } \end{array} \right.$ which remains dual feasible- edges in
$S\times N(S)$ : unchanged$+\epsilon -\epsilon$ - edges in
$(A\setminus S)\times(B\setminus N(S))$ : unchanged - edges in
$(A\setminus S)\times N(S)$ : decreased by$\epsilon$ , stop being tight - edges in
$S\times (B\setminus N(S))$ : increased by$\epsilon$ but were not tight (not in$E'$ )
- edges in
- dual value increase : by
$(\abs{S} - \abs{N(S)})\epsilon$ - to get tight edge choose
$\epsilon$ as large as possible for which dual remains feasible :$\epsilon =\min _{e = ( a ,b ) \in S \times ( B\setminus N(S) ) } c ( e ) - u _ { a } - v _ { b } > 0$
- initialization :
- NP-hard : finding solution likely to be intractable, no algorithm that satisfies following
- efficient : polynomial time
- reliable : works for any input instance
- optimality : find an optimal solution
- relax reliability : heuristics design to work well in practice
- relax optimality : approximation algorithms
-
$\alpha$ -approximation : runs in polytime and outputs a solution$S$ s.t.- minimization :
$\frac {\operatorname{cost} ( S ) } { \operatorname{cost} ( \text{Optimal solution) } } \leq \alpha$ - maximization :
$\frac { p r o f i t ( S ) } { p r o f i t ( \text{Optimal solution) } } \geq \alpha$ - lower bound : because optimum hard to compute,
$\frac { \operatorname{cost} ( S ) } { \operatorname{cost} (\text{Optimal solution} ) } \leq \frac { \operatorname{cost} ( S ) } { \text{lower bound on opt } } \leq \alpha$
- minimization :
- LP popular framework : given formulation as integer LP (binary
$x_i\in{0,1}$ ), relax to LP ($x_i\in[0,1]$ ), solve LP as$x^*$ and round solution without losing too much - vertex cover
- input :
$G=(V,E)$ , vertices weights$w:V\to\R_+$ - output : set
$C\subseteq V$ of minimum weights s.t.$u\in C$ or$v\in C$ $\forall {u, v}\in E$ - relax :
$x_v\in[0,1]$ - ILP : minimize
$\sum _ { v \in V } w ( v ) x _ { v }$ s.t$x _ { u } + x _ { v } \geq 1\quad\forall{u, v}\in E$ with$x_v\in{0,1}$ -
$2$ -approximation LP :$x_v\in[0,1]$ with rounding$C = \left{ v \in V : x _ { v } ^ { * } \geq \frac { 1} { 2} \right}$ which is feasible- weight of
$C$ : at most twice the optimal$\sum _ { v \in C } w ( v ) = \sum _ { v \in V : x _ { v }^* \geq \frac { 1} { 2} } w ( v ) \leq \sum _ { v \in V : x _ { v }^* \geq \frac { 1} { 2} } 2x _ { v } ^ { * } w ( v ) = 2\sum _ { v \in V } x _ { v } ^ { * } w ( v ) = 2L P _ { O P T } \leq 2V C _ { O P T }$
- weight of
- input :
- integrality gap :
$g = \max _ { I \in \mathcal { I } } \frac { O P T ( I ) } { O P T _ { L P } ( I ) }$ with $\mathcal I $ set of all instances of given problem, bound power of linear relaxation- vertex cover :
$n$ vertices,$g \geq \frac { n - 1} { \frac { n } { 2} } = 2- \frac { 2} { n }$
- vertex cover :
- set cover
- input : universe
$\mathcal { U } ={e_1,\ldots, e_n}$ , family of subsets$\mathcal T={S_1,\ldots, S_m}$ and cost function$c:\mathcal T\to\R_+$ - output : find collection
$C$ of subsets of minimum cost that cover all elements - ILP : minimize
$\sum _ { i = 1} ^ { m } x _ { i } \cdot c \left( S _ { i } \right)$ s.t.$\sum _ { S _ { i } : e \in S _ { i }} x _ { i } \geq 1\quad\forall e\in\mathcal U$ with$x_i\in{0,1}$ - LP :
$x_i\in[0,1]$ with$C={S_i:x_i^*\ge \frac{1}{f}}$ with each elements belonging to at most$f$ sets,$f$ approximation factor - randomized rounding : better
- algorithm
- solve LP : get optimal solution
$x^*$ - choose positive integer constant
$d$ : start with empty result set$C$ , repeat$4 d\ln(n)$ times - for
$i=1,\ldots, m$ : add set$S_i$ to solution$C$ with probability$x_i^*$ choosing independently for each set
- solve LP : get optimal solution
- expected cost of all sets added :
$E[\text{rounded cost}]= \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) P r \left[ S _ { i } \text{ is added } \right] = \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) x _ { i } ^ { * } = L P _ { O P T }$ - expected cost after
$d\ln(n)$ executions : at most$d \cdot \ln ( n ) \cdot \sum _ { i = 1} ^ { m } c \left( S _ { i } \right) x ^ { * } \leq d \cdot \ln ( n ) \cdot L P _ { O P T } \leq d \cdot \ln ( n ) \cdot O P T$ - probability constraint unsatisfied after single execution : at most
$\frac{1}{e}$ as$x_1+\cdots+x_k\ge 1$ with$1-x\le e^{-x}$ - probability of feasible solution : at least
$1-\frac{1}{n^{d-1}}$ , using union bound - output feasible solution of cost at most
$4d\ln(n)OPT$ : probability greater than$\frac{1}{2}$ , use expected cost and Markov inequality to bound cost$Pr[cost > 4\mu]\le\frac{1}{4}$ , bound infeasability by$\frac{1}{n^{d-1}}\le \frac{1}{n}$ , we have$1-\frac{1}{4}-\frac{1}{n}$ (if disjoint)
- algorithm
- input : universe
- weighted majority algorithm : sequencial game between omniscient adverary and aggregator advised by
$N$ expert- algorithm with
$\frac{1}{2}$ - assign each expect
$i$ weight$w_i^{(1)}=1$ - for every
$t$ : predict yes/non based on weighted majority - for every mistaken expert :
$w_i^{(t+1)}=w_i^{(t)}/2$ - any sequence of duration
$T$ and expert$i$ :$#\text{majority mistakes}\le 2.41 (#\text{of i's mistakes} + \log_2 N)$
- assign each expect
- potential function :
$\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$ -
$\epsilon$ update rule :$w_i^{(t+1)}=w_i^{(t)}/(1-\epsilon)$ achieve$#\text{majority mistakes}\leq 2(1+\epsilon)(#\text{ of i's mistakes})+O(\log N/\epsilon)$ - lowerbound :
$\Phi ^ { ( T + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( T + 1) } \geq w _ { i } ^ { ( T + 1) } = ( 1- \epsilon ) ^ {\text{ # of } i \text{'s mistakes }}$ - upperbound :
$\Phi ^ { ( T + 1) } \leq \Phi ( 1) \cdot ( 1- \epsilon / 2) ^ { \text{# of WM mistakes} }$ , at last half of weights gets halved
- lowerbound :
- at worst : twice as many mistakes as best expert
- algorithm with
- hedge
- setup
- each expert
$i$ gives somes advice - allocator picks distribution :
$\vec { p } ^ { ( t ) } = \left( p _ { 1} ^ { ( t ) } ,\dots ,p _ { N } ^ { ( t ) } \right)$ over the experts - adversary determines cost vector :
$\vec { m } ^ { ( t ) } = \left( m _ { 1} ^ { ( t ) } ,\dots ,m _ { N } ^ { ( t ) } \right) \in [ - 1,1] ^ { N }$ , negative number means profitable to follow the specific expert - allocator suffer :
$\vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } = \sum _ { i \in [ N ] } p _ { i } ^ { ( t ) } m _ { i } ^ { ( t ) }$
- each expert
- algorithm
- assign each expert
$i$ :$w_i^{(1)}=1$ - pick distribution : $p _ { i } ^ { ( t ) } = w _ { i } ^ { ( t ) } / \Phi ^ { ( t ) }$with
$\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$ - observe cost :
$w _ { i } ^ { ( t + 1) } = w _ { i } ^ { ( t ) } \cdot e ^ { - \varepsilon \cdot m _ { i } ^ { ( t ) } }$
- assign each expert
- bounds :
$\sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } \leq \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } + \frac { \ln N } { \epsilon } + \epsilon T$ - lowerbound :
$\Phi ^ { ( T + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( T + 1) } \geq w _ { i } ^ { ( T + 1) } = \exp \left( - \epsilon \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } \right)$ - upperbound :
$\Phi ^ { ( t + 1) } = \sum _ { j \in [ N ] } w _ { j } ^ { ( t ) } \cdot \exp \left( - \epsilon m _ { j } ^ { ( t ) } \right)\leq \Phi ^ { ( 1) } \cdot \exp \left( \epsilon ^ { 2} T - \epsilon \sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \vec { m } ^ { ( t ) } \right)$ as$\exp \left( - \epsilon \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } \right)\leq \sum _ { j \in [ N ] } w _ { j } ^ { ( t ) } \left( 1- \epsilon m _ { j } ^ { ( t ) } + \epsilon ^ { 2} \right)\leq \Phi ^ { ( t ) } \cdot \exp \left( \epsilon ^ { 2} - \epsilon \vec { p } ^ { ( t ) } \vec { m } ^ { ( t ) } \right)$ using taylor expansion$e ^ { x } \leq 1+ x + x ^ { 2}$ for$x\in[-1,1]$ and$\left( m _ { j } ^ { ( t ) } \right) ^ { 2} \leq 1$ - generalisation : cost between
$[-p,p]^N$ with width$p$ - if
$T \geq \left( 4\rho ^ { 2} \ln N \right) / \epsilon ^ { 2}$ : for any expert$\frac { 1} { T } \sum _ { t = 1} ^ { T } \vec { p } ^ { ( t ) } \cdot \vec { m } ^ { ( t ) } \leq \frac { 1} { T } \sum _ { t = 1} ^ { T } m _ { i } ^ { ( t ) } + 2\epsilon$
- if
- setup
- covering LP : associate expert with each constraint of LP and rewrite
$\sum _ { i = 1} ^ { n } d _ { i } x _ { i } \geq b$ the weighted sum of all constraints, assign maximum value to variables that have highest ratio constraint coefficient/objective coefficient- reduced LP : oracle, minimize
$\sum _ { j = 1} ^ { n } c _ { j } x _ { j }$ s.t. $\left( \sum _ { i = 1} ^ { m } p _ { i } A _ { i } \right) \cdot x \geq \sum _ { i = 1} ^ { m } p _ { i } b _ { i }$for$1\geq x \geq 0$ ,- fractional knapsack
- sort and relabel
$x$ s.t.$d_1/c_1\ge\cdots\ge d_n/c_n$ - let :
$k = \min \left{ j : \sum _ { i = 1} ^ { j } d _ { i } \geq b \right}$ and$\ell = b - \sum _ { i = 1} ^ { k - 1} d _ { i }$ - set :
$x_i=1$ for$i=1,\ldots,k-1$ ,$x_k=\ell/d_k$ ,$x_i=0$ for$i=k+1,\ldots,n$
- sort and relabel
- fractional knapsack
- algorithm
- assign each constraint
$i$ :$w _ { i } ^ { ( 1) }=1$ - pick distribution :
$p _ { i } ^ { ( t ) } = w _ { i } ^ { ( t ) } / \Phi ^ { ( t ) }$ with$\Phi ^ { ( t ) } = \sum _ { i \in [ N ] } w _ { i } ^ { ( t ) }$ - cost vector
- solution to reduced LP :
$x^{(t)}$ noting$c ^ { T } x ^ { ( t ) }$ at most cost of optimal solution to LP - cost constraint
$i$ :$m _ { i } ^ { ( t ) } = \sum _ { j = 1} ^ { n } A _ { i j } x _ { j } - b _ { i } = A _ { i } x - b _ { i }$ (positive weighted if constrained satisfied, to descrease)
- solution to reduced LP :
- update :
$w _ { i } ^ { ( t + 1) } = w _ { i } ^ { ( t ) } \cdot e ^ { - \varepsilon \cdot m _ { i } ^ { ( t ) } }$ - output : average of constructed solutions
$\overline { x } = \frac { 1} { T } \sum _ { t = 1} ^ { T } x ^ { ( t ) }$ , almost feasible
- assign each constraint
- cost :
$c ^ { T } \overline { x } = c ^ { T } \left( \frac { 1} { T } \sum _ { t \in [ T ] } x ^ { ( t ) } \right)$ at most cost of optimal solution since each$x^{(t)}$ no higher cost than optimal solution (properties of reduced LP) - parameters :
$T = \left( 4n ^ { 2} \ln m \right) / \epsilon ^ { 2}$ leads$\overline x$ to satisfy$\sum _ { e \in S } x _ { e } \geq 1- 2\epsilon$ and obtain feasible approximately optimal solution with$x ^ { * } = \frac { 1} { 1- 2\epsilon } \overline { x }$
- reduced LP : oracle, minimize
- simplex method : fast in practice, at worst exponential
- slack variables :
$s_1,\ldots,s_m$ that compensate for equalities, non-negative - current objective function :
$z$ - simplex tableau : all constraints and object function
- maximize one variable with positive coefficient while respecting constraint : pivot between
$s_1\leftrightarrow x _ { 2}$ - each constraint can enforce an upperbound (without having a negative variable), give no upperbound (multiple of the same kind) or say nothing (missing current variable)
- until no increase possible : ignore value of slack variables
- technicalities
- unboundness : infinite solution
- degeneracy : might need to pivot two variables to increase one even though current value is zero, avoid to cycle through this swapping by lexicographic ordering
- initial vertex : phase I to find initial starting point, might require another solver
- infeasibility : should be detected by phase I
- slack variables :
- randomized algorithms : in addition to input, take random source of bits and produce output
- correctness probability :
$p>0$ , run$k$ times lead to$1-(1-p)^k$ (boost) - approximation : for small
$p$ consider$1-p$ as$e^{-p}$
- correctness probability :
- Las Vegas algorithms : randomized that give always correct but polytime in expectation
- minimum cut problem
- input :
$G = ( V ,E )$ with$n = | V |$ - output : paritition into two subsets joined with minimum number of edges
$\min _ { \emptyset \subset S \subset V } | E ( S ,\overline { S } ) |$ with$E ( S ,\overline { S } ) : = { ( u ,v ) : u \in S ,v \notin S }$ - contraction procedure :
$(u,v)$ in$G$ , merge both endpoints to create new node$uv$ , might lead to parallel edges - hand-shake lemma : in any graph
$\sum _ { v \in V } d ( v ) = 2| E |$ with$d(v)$ degrees of$v$ - Krager :
$O(n^4\log n)$ by boosting$O(n^2\log n)$ times as raw$O(n^2)$ with success$O(1/n^2)$ to reach$1-\frac{1}{n}$ success- size of minimum cut :
$| E \left( S ^ { * } ,\overline { S ^ { * } } \right) |=k$ - uniform random edge in paritition : at most
$\mathbb { P } \left[ e \in E \left( S ^ { * } ,\overline { S ^ { * } } \right) \right] = \frac { k } { | E | } \leq \frac { k } { n k / 2} = \frac { 2} { n }$ by hand-shake as$d ( v ) = | E ( { v } ,\overline { { v } } ) | \geq k$ - idea : choose
$n/4$ edges, with probability$1/2$ none of them in min-cut, contract them, new graph will have at most$3n/4$ vertices, recurse, after$O(\log n)$ steps, two super nodes and probability at least$(1/2)^{O(\log n)}$ no edges in min-cut contracted - for each contraction, size of minimum cut does not decrease : contract only one edge by on edge to avoid paralel contraction
- probability of returning min-cut : at least
$\frac{2}{n(n-1)}$ -
$A_i$ event that edge picked in step$i$ not in$E \left( S^* ,\overline { S ^ { * } } \right)$ - by Bayes rules :
$\mathbb { P } \left[ A _ { 1} ,\ldots ,A _ { n - 2} \right] = \mathbb { P } \left[ A _ { 1} \right] \mathbb { P } \left[ A _ { 2} | A _ { 1} \right] \mathbb { P } \left[ A _ { 3} | A _ { 1} ,A _ { 2} \right] \ldots \mathbb { P } \left[ A _ { n - 2} | A _ { 1} ,A _ { 2} ,\ldots ,A _ { n - 3} \right]$ as$\mathbb { P } \left[ A _ { i } | A _ { 1} ,\ldots ,A _ { i - 1} \right] \geq 1- \frac { 2} { n - i + 1}$ - lowerbound : $\mathbb { P } \left[ A _ { 1} ,\dots ,A _ { n - 2} \right] \geq = \frac { 2} { n ( n - 1) } = 1/ \left( \begin{array} { l } { n } \ { 2} \end{array} \right)$
-
- collorary : any graph at most
$n\choose 2$ min-cuts
- size of minimum cut :
- Krager-Stein :
$O(n^2\log^3 n)$ by boosting$\log^2n$ times as raw$O(n^2\log n)$ with success$\Omega(1/\log n)$ to reach$1-1/n$ success- probability of returning min-cut : at least
$1/(2\log n)$ - probability of not contracting any edge for
$n-n/\sqrt{2}$ steps : at least$\frac { r - 2} { r } \cdot \frac { r - 3} { r - 1} \cdots \frac { r / \sqrt { 2} - 2} { r / \sqrt { 2} } \approx \frac { ( r / \sqrt { 2} ) ^ { 2} } { r ^ { 2} } = 1/ 2$ - show :
$\frac { 1} { 2} \left( p + p - p ^ { 2} \right) = p - p ^ { 2} / 2\geq \frac { 1} { 2\log n }$ with$p$ probability of success of any of the two copies, in worst case$p = 1/ ( 2\log ( n / \sqrt { 2} ) )$
- probability of not contracting any edge for
- probability of returning min-cut : at least
- input :
- master method :
$T(n)=aT(\frac{n}{b})+f(n)$ -
$f ( n ) = O \left( n ^ { \log _ { b } a - \epsilon } \right)$ :$T ( n ) = \Theta \left( n ^ { \log _ { b } a } \right)$ -
$f ( n ) = \Theta \left( n ^ { \log _ { b } a } \right)$ :$T ( n ) = \Theta \left( n ^ { \log _ { b } a } \log n \right)$ -
$f ( n ) = \Omega \left( n ^ { \log _ { b } a + \epsilon } \right)$ and$a \cdot f ( n / b ) \leq c \cdot f ( n )$ :$T ( n ) = \Theta ( f ( n ) )$
-
- polynomial identity testing : given polynomial
$p$ ,$q$ defined over common set of variables$x$ , determine whether$p(x)-q(x)=0$ for all value with oracle access (no individual term known, but can evaluate at specific input)- naive : does not handle determinant which cost exponential
- monomial : product of powers of variables with non negative exponents
- degree : sum of all exponents involved
- polynomial : sum of monomials, each reffered as a term
- degree : largest degree of each term
- determinant :
$\operatorname{det} ( A ) = \sum _ { \sigma : [ n ] \rightarrow [ n ] } \operatorname{sgn} ( \sigma ) \prod _ { i = 1} ^ { n } A _ { i ,\sigma ( i ) }$ with permutation$\sigma$ - Schwartz-Zippdel lemma :
$p \left( x _ { 1} ,\dots ,x _ { n } \right)$ nonzero polynomial of$n$ variables with degree$d$ , let$S$ finite subset of$\mathbb R$ with at least$d$ elements in it, if we assign$x_1,\ldots,x_n$ values from$S$ independently and uniformly at random$\mathbb { P } \left[ p \left( x _ { 1} ,\ldots ,x _ { n } \right) = 0\right] \leq \frac { d } { | S | }$ - base case :
$n=1$ , univariate case as martix identity testing - strong induction : assume lemma holds for less than
$n$ variables- fix
$x _ { 1} ,\dots ,x _ { n - 1}$ arbitrarily :$p \left( x _ { n } \right) = a _ { k } x _ { n } ^ { k } + a _ { k - 1} x _ { n } ^ { k - 1} + \ldots + a _ { 1} x _ { n } ^ { 1} + a _ { 0}$ becomes a univariate polynomial of$x_n$ and degree$k\leq d$ - thus :
$\mathbb { P } \left[ p \left( x _ { n } \right) = 0\right] \leq \frac { k } { | S | } \leq \frac { d } { | S | }$ but need to argue that probability unaffected by choice of$x_1,\ldots, x_{n-1}$ which is not the case - argue that adverserial scenario rare
- fix
- base case :
- matrix identity testing : whether
$AB=C$ without matrix multiplication- randomized : build random vector
$x\in\mathbb R^n$ by choosing independently and uniformly from S,$x _ { i } \sim \text{ Uniform } ( S )$ , test$ABx=Cx$ to conclude AB=C- false positive :
$\mathbb { P } _ { x _ { i } \sim S} [ A B x \neq C x ] \geq 1- \frac { 1} { | S | }$ as only$x_j$ value would satisfy$x _ { j } \left( a _ { i j } - c _ { i j } \right) = - \sum _ { k \neq j } \left( a _ { i k } - c _ { i k } \right) x _ { k }$ and as independent other$x$ have no effect$\mathbb { P } _ { x \sim S } \left[ \left\langle a _ { i } ,x \right\rangle = \left\langle c _ { i } ,x \right\rangle \right] \leq \frac { 1} { | S | }$ - principle of deferred decision : random choices made only when relevant, careful when relaxing fixed assumptions, must hold regardless of their values
- long division for polynomials : $p ( x ) = d ( x ) q ( x ) + r ( x )$with
$p(x)$ degree$d$ , divisor$d(x)$ degree$k\le d$ , quotient$q(x)$ degree at most$d-k$ and remainder$r(x)$ degree at most$k-1$ - largest degree
$k$ in all monomial division :$p \left( x _ { 1} ,\dots ,x _ { n } \right) = x _ { n } ^ { k } q \left( x _ { 1} ,\dots ,x _ { n - 1} \right) + r \left( x _ { 1} ,\dots ,x _ { n } \right)$ - reuse principle of deferred decision :
$\mathbb { P } _ { x _ { 1} ,\ldots ,x _ { n - 1} \sim S} \left[ q \left( x _ { 1} ,\ldots ,x _ { n - 1} \right) = 0\right] \leq \frac { d - k } { | S | }$ and if$q\not = 0$ then$p(x_1,\ldots, x_n)$ is univariate in$x_n$ and$x_n^k$ nonzero$\mathbb { P } _ { x _ { n } \sim S } [ p = 0| q \neq 0] \leq \frac { k } { | S | }$ - finish :
$\mathbb { P } [ p = 0] = \mathbb { P } [ p = 0| q = 0] \cdot \mathbb { P } [ q = 0] + \mathbb { P } [ p = 0| q \neq 0] \mathbb { P } [ q \neq 0]\leq \mathbb { P } [ q = 0] + \mathbb { P } [ p = 0| q \neq 0]\leq \frac { d - k } { | S | } + \frac { k } { | S | } = \frac { d } { | S | }$
- false positive :
- randomized : build random vector
- perfect bipartite graph matching existence
- perfect matching : involve every vertex in matching
- deterministic algorithm :
$O ( | E | \sqrt { | X | + | Y | } )$ - randomized : assume squared adjacency matrix
$A_{ij}=x_ij$ if$x_i$ and$y_j$ connected, existence of perfect matching iff determinant of$A$ different from$0$ -
$\Rightarrow$ : mapping as bijection can be seen as permutation on integer set$\prod _ { i = 1} ^ { n } A _ { i ,f ( i ) } = \prod _ { i = 1} ^ { n } x _ { i ,f ( i ) }$ monomial similar to determinant when$\sigma=f$ -
$\Leftarrow$ : suppose determinant different from$0$ , at least one permutation with all terms non zero, which is a bijection - correctness probability :
$1-1/n$ for$| S | \geq n ^ { 2}$ - find the matching : choose
$| S | \gg n$ with$x _ { i j } = 2^ { w _ { i j } }$ independently and uniformly at random from$S$ , with high probability,$\operatorname{det} ( A ) = 2^ { w ( M ) } ( \pm 1+ [ \text{ even } \text{ number } ] )$ with sum of weight of edges of minimum weight perfect matching, for every$(x_i,y_j)$ delete edge and test whether weight decreases to$w(M)-w_{i,j}$ (then the edge belong to the mapping)
-
- general graph matching : exists iff determinant of
$A$ nonzero- skew-symmetric matrix : Tutte matrix
$A_{ij}=x_{ij}$ if vertices$v_i$ and$u_j$ connected with edge$i<j$ or$-x_{ij}$ for$i\ge j$ else$0$
- skew-symmetric matrix : Tutte matrix
- Markov inequality : non-negative,
$\mathbb { P } [ X \geq k \cdot \mathbb { E } [ X ] ] \leq \frac { 1} { k }$ or$P(X\ge k)=\frac{E[X]}{k}$ as$\mathbb { E } [ X ] = \sum _ { i } \mathbb { P } [ X = i ] \geq \sum _ { i \geq k } i \cdot \mathbb { P } [ X = i ] \geq \sum _ { i \geq k } k \cdot \mathbb { P } [ X = i ] = k \cdot \mathbb { P } [ X \geq k ]$ - law of large number :
$\sqrt { n } \left( \frac { 1} { n } \sum _ { i = 1} ^ { n } X _ { i } - \mu \right) \rightarrow \mathcal { N } ( 0,1)$ as$n\to\infty$
- law of large number :
- Chebyshev inequality :
$\mathbb { P } [ | X - \mathbb { E } X | > \epsilon ] < \frac { \operatorname{Var} ( X ) } { \epsilon ^ { 2} }$ or $\mathbb { P } [ | X - \mathbb { E } [ X ] | > k \sigma ] \leq \frac { 1} { k ^ { 2} } $ as Markov can be applied to variance- variance :
$\operatorname{Var} ( X ) = \mathbb { E } \left[ ( X - \mathbb { E }[ X] ) ^ { 2} \right] =\mathbb { E } \left[ X ^ { 2} \right] - \mathbb { E } [ X ] ^ { 2}$ - 2nd moment at least 1st moment squared : as variance always positive
- standard dev :
$\sigma = \sqrt { \operatorname{Var} ( X ) }$ - pairwise independent :
$\mathbb { E } \left[ X _ { i } X _ { j } \right] = \mathbb { E } \left[ X _ { i } \right] \mathbb { E } \left[ X _ { j } \right]$ for all$i,j$ and$\operatorname{Var} \left( X _ { 1} + \cdots + X _ { n } \right) = \operatorname{Var} \left( X _ { 1} \right) + \cdots + \operatorname{Var} \left( X _ { n } \right)$
- variance :
- Chernoff inequality :
$P(X\ge a)\le e^{-ta}E[e^{tX}]$ , avergage of mutually independent random variable- Bernouilli sum :
$X = \sum _ { i = 1} ^ { n } X _ { i }$ ,$\mu = \mathbb { E } [ X ] = \sum _ { i = 1} ^ { n } p _ { i }$ - upper tail :
$\mathbb { P } [ X \geq ( 1+ \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} } { 2+ \delta } \mu } \text{ for all } \delta > 0$ - lower tail :
$\mathbb { P } [ X \leq ( 1- \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} } { 2} \mu } \text{ for all } \delta > 0$ - application :
$\mathbb { P } \left[ | \frac { \sum X _ { i } } { n } - p | \geq \epsilon \right]\leq \mathbb { P } \left[ \sum X _ { i } \geq ( 1+ \epsilon / p ) p n \right] + \mathbb { P } \left[ \sum X _ { i } \leq ( 1- \epsilon / p ) p n \right] \leq e ^ { - \frac { \epsilon ^ { 2} n } { 3} } + e ^ { - \frac { \epsilon ^ { 2} n } { 2} } $
- upper tail :
- bounded sum :
$a \leq X _ { i } \leq b \text{ for all } i$ ,$X = \sum _ { i = 1} ^ { n } X _ { i }$ ,$\mu = \mathbb { E } [ X ]$ - upper tail :
$\mathbb { P } [ X \geq ( 1+ \delta ) \mu ] \leq e ^ { - \frac { 2\delta ^ { 2} \mu ^ { 2} } { n ( b - a ) ^ { 2} } } \text{ for all } \delta > 0$ - lower tail :
$\mathbb { P } [ X \leq ( 1- \delta ) \mu ] \leq e ^ { - \frac { \delta ^ { 2} \mu ^ { 2} } { n ( b - a ) ^ { 2} } } \text{ for all } \delta > 0$
- upper tail :
- proof :
$\mathbb { P } \left[ e ^ { s X } \geq e ^ { s a } \right]\leq \frac { \mathbb { E } \left[ e ^ { s X } \right] } { e ^ { s a } }$ , independence use to expand$\mathbb { E } \left[ e ^ { s X } \right] = \mathbb { E } \left[ e ^ { s \left( X _ { 1} + X _ { 2} + \cdots + X _ { n } \right) } \right] = \prod _ { i = 1} ^ { n } \mathbb { E } \left[ e ^ { s X _ { i } } \right]$ which can be bounded
- Bernouilli sum :
- hashing :
$h : U \rightarrow { 1,2,\dots N }$ with$N \ll | U |$ and small subset$S\subset U$ , uniform and independent mapping- table :
$T [ h ( x ) ]$ , insert, delete, query, with linked list for collisions - no worst case guarantee : for fixed hash, use randomly
$h$ from family - indicator collision :
$I_y$ if$h(y)=h(x)$ else$0$ - collision lenght :
$L _ { x } = 1+ \sum _ { y \in S : y \neq x } I _ { y }$ with$\mathbb { E } \left[ L _ { x } \right] = 1+ \frac { | S | - 1} { N }$ - uniform hash function : record need
$| \mathcal { U } | \log N$ bits
- table :
-
$2$ -universal hash families :$\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = h ( y ) ] \leq \frac { 1} { N }$ for$x\not=y\in U$ , can be relaxed to a small constant- design : choose prime
$p \in { | U | ,\ldots ,2| U | }$ with field$f _ { a ,b } ( x ) = a x + b \mod p \quad ( a ,b \in [ p ] ,a \neq 0)$ and$h _ { a ,b } ( x ) = f _ { a ,b } ( x ) \mod N$ - lemma :
$a x + b = s \mod p$ and$a y + b = t \mod p$ for$x\not =y$ and$s\not = t$ only have one solution$a = ( x - y ) ^ { - 1} ( s - t )$ and$b = s - a x$ result into$\mathbb P [ f_{a,b}( x ) = s \wedge f_{a,b}( y ) = t ] = \frac { 1} { p ( p - 1) }$ diffrent hash function-
$2$ -universal :$\mathcal { H } = \left{ h _ { a ,b } : a ,b \in [ p ] \wedge a \neq 0\right}$ as$\mathbb { P } \left[ h _ { a ,b } ( x ) = h _ { a ,b } ( y ) \right] = \sum _ { s ,t \in [ p ] : s \neq t } 1 _ { s = t \mod N } \cdot \mathbb { P } \left[ f _ { a ,b } ( x ) = s \wedge f _ { a ,b } ( y ) = t \right]\leq \frac { 1} { p ( p - 1) } \frac { p ( p - 1) } { N }$ - storage :
$O(\log|U|)$ bits
-
- expected # collision : $\sum _ { x \neq y \in S } \mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = h ( y ) ] \leq \left( \begin{array} { c } { | S | } \ { 2} \end{array} \right) / N$
- two-layer tables :
$s_i$ collisons at 1st layer, 2nd layer size$s_i^2$ ,$\mathbb { E } \left[ \sum _ { i } s _ { i } ^ { 2} \right] = \mathbb { E } \left[ \sum _ { i } s _ { i } \left( s _ { i } - 1\right) \right] + \mathbb { E } \left[ \sum _ { i } s _ { i } \right] = \frac { | S | ( | S | - 1) } { N } + | S | \leq 2| S |$ - table size : adapted as it grows
- design : choose prime
- pairwise independence : $\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = s ,h ( y ) = t ] = \frac { 1} { N ^ { 2} }$for
$x\not=y$ and$s\not= t$ -
$1$ -wise :$\mathbb { P } _ { h \in \mathcal { H } } [ h ( x ) = s ] = \frac { 1} { N }$ -
$k$ -wise :$f _ { a _ { 0} ,\dots ,a _ { k - 1} } ( x ) = a _ { k - 1} x ^ { k - 1} + \ldots + a _ { 1} x + a _ { 0}$ , from the Vandermonde matrix, stored in$O ( k \log | U | )$ - load balancing
- balls and bins : $\mathbb { P } [ \text{ bin i gets more than k elements } ] \leq \left( \begin{array} { l } { n } \ { k } \end{array} \right) \cdot \frac { 1} { n ^ { k } } \leq \frac { 1} { k ! }$
- Stirling formula :
$k ! \approx \sqrt { 2n k } \left( \frac { k } { e } \right) ^ { k }$ - choose :
$k = O \left( \frac { \log n } { \log \log n } \right)$ implies$\frac { 1} { k ! } \leq \frac { 1} { n ^ { 2} }$ and$\mathbb { P } [ \exists \text{ bin } \geq k \text{ balls } ] \leq n \cdot \frac { 1} { n ^ { 2} } = \frac { 1} { n }$ giving$\max \operatorname{load} \leq O \left( \frac { \log n } { \log \log n } \right)$ with probability$1-1/n$
- power of two choices : pick two random bins, choose the one with fewer balls, maximal load drops to
$O ( \log \log n )$
-
- streaming algorithm
- stream :
$\sigma = \left\langle a _ { 1} ,a _ { 2} ,\dots ,a _ { m } \right\rangle$ with value from universe$[ n ] = { 1,\dots ,n }$ - from left to right
- small space :
$k$ randomaccess memory bits, want$s = O ( \min { m ,n } )$ , holy grail at$s = O ( \log m + \log n )$ - passes :
$p=1$ ideally
- stream :
- frequent item deterministrically :
$f_1+\cdots+f_n=m$ - majority problem : output
$j$ if$f_j>m/2$ exists, but deterministic space$\Omega ( \min { m ,n } )$ - frequent problem with parameter
$k$ : output set$\left{ j : f _ { j } > m / k \right}$ , but deterministic space$\Omega ( \min { m ,n } )$ - Misra-Gries algorithm :
$p=1$ , quality$k$ - algo
- initialize : empty associate array
- process : if
$j \in k e y s ( A )$ ,$A [ j ] = A [ j ] + 1$ , else if$| k e y s ( A ) | < k - 1$ ,$A [ j ] = 1$ else foreach$\ell \in k e y s ( A )$ ,$A [ \ell ] = A [ \ell ] - 1\text{ if } A [ \ell ] = 0$ and remove$0$ counts - output : answer questions, report
$\hat { f } _ { a } = A [ a ]$
- space : at most
$k-1$ pairs,$O ( k ( \log n + \log m ) )$ - analysis :
$f _ { j } - \frac { m } { k } \leq \hat { f } _ { j } \leq f _ { j }$ as at most$m/k$ decrement, can do a 2nd pass to get desired result
- algo
- majority problem : output
- number of distinct elements :
$d(\sigma)=| \left{ j : f _ { j } > 0\right} |$ , proved impossible deterministically- guarantee seeked :
$\operatorname{Pr} [ d ( \sigma ) / 3\leq A ( \sigma ) \leq 3d ( \sigma ) ] ] \geq 1- \delta$ with space$O ( \log ( 1/ \delta ) \log n )$ - lemma : exists pairwise independent hash family with
$h$ sampled by picking and calculated in$O(\log n)$ random bits - zero function :
$\operatorname{zeros} ( p ) = \max \left{ i : 2^ { i } \text{ divides } p \right}$ , # zeros in binary representation - algo
- initialize : choose
$h$ and$z=0$ - process :
$\operatorname{zeros} ( h ( j ) ) > z$ implies$z = \operatorname{zeros} ( h ( j ) )$ - output :
$2^{z+1/2}$
- initialize : choose
- analysis :
$X_{r,j}$ event for "$\operatorname{zeros} ( h ( j ) ) \geq r$",$Y _ { r } = \sum _ { j : f _ { j } > 0} X _ { r ,j }$ ,$t$ end value- restate :
$Y _ { r } > 0\Leftrightarrow t \geq r$ into$Y _ { r } = 0\Leftrightarrow t \leq r - 1$ - uniformly distributed :
$\mathbb { E } \left[ X _ { r ,j } \right] = \operatorname{Pr} [ \operatorname{zeros} ( h ( j ) ) \geq r ] = \operatorname{Pr} \left[ 2^ { r } \operatorname{divides} h ( j ) \right] = \frac { 1} { 2^ { r } }$ - estimate mean and variance :
$\mathbb { E } \left[ Y _ { r } \right] = \frac { d } { 2^ { r } }$ and$\text{Var} \left[ Y _ { r } \right] = \sum _ { j : f _ { j } > 0} \left( \mathbb { E } \left[ X _ { r ,j } ^ { 2} \right] - \mathbb { E } \left[ X _ { r ,j } \right] ^ { 2} \right)\leq \sum _ { j : f _ { j } > 0} \mathbb { E } \left[ X _ { r ,j } ^ { 2} \right] = \sum _ { j \cdot f _ { j } > 0} \mathbb { E } \left[ X _ { r ,j } \right] = \frac { d } { 2^ { r } }$ - inequality :
$\operatorname{Pr} \left[ Y _ { r } > 0\right] = \operatorname{Pr} \left[ Y _ { r } \geq 1\right] \leq \frac { \mathbb { E } \left[ Y _ { r } \right] } { 1} = \frac { d } { 2^ { r } }$ and$\operatorname{Pr} \left[ Y _ { r } = 0\right] \leq \operatorname{Pr} \left[ | Y _ { r } - \mathbb { E } \left[ Y _ { r } \right] | \geq \frac { d } { 2^ { r } } \right] \leq \frac { \operatorname{Var} \left[ Y _ { r } \right] } { \left( d / 2^ { r } \right) ^ { 2} } \leq \frac { 2^ { r } } { d }$ - estimate :
$\hat { d } = 2^ { t + 1/ 2}$ - upper :
$a$ smallest integer s.t.$2^ { a + 1/ 2} \geq 3d$ ,$\operatorname{Pr} [ \hat { d } \geq 3d ] = \operatorname{Pr} [ t \geq a ] = \operatorname{Pr} \left[ Y _ { a } > 0\right] \leq \frac { d } { 2^ { a } } \leq \frac { \sqrt { 2} } { 3}$ - lower :
$b$ largest integer s.t.$2^ { b + 1/ 2} \leq d / 3$ ,$\operatorname{Pr} [ \hat { d } \leq d / 3] = \operatorname{Pr} [ t \leq b ] = \operatorname{Pr} \left[ Y _ { b + 1} = 0\right] \leq \frac { 2^ { b + 1} } { d } \leq \frac { \sqrt { 2} } { 3}$ - conclusion : high failure bound, increase approximation or use median trick
- restate :
- median trick : run
$k$ copies in parallel, mutually independent random hash outputting medians- chernoff : expect
$\le k\frac{\sqrt{2}}{3}$ to exceed$3d$ with probability$\leq 2^{-\Omega(k)}$ , choose$k=\Theta(\log(1/\delta))$ with proability at most$\delta$
- chernoff : expect
- space : hashing
$O(\log n)$ , store$z$ $O(\log\log n)$ , thus$O(\log(1/\delta)\log n)$
- guarantee seeked :
- frequency vector :
$\mathbf { f } = \left( f _ { 1} ,\dots ,f _ { n } \right)$ where$f _ { i } = | \left{ j : a _ { j } = i \right} |$ and$f _ { 1} + f _ { 2} + \dots + f _ { m } = m$ - second moment :
$F _ { 2} = \sum _ { i = 1} ^ { n } f _ { i } ^ { 2}$
- second moment :
- downsampling : to size at most
$O ( \sqrt { n } )$ not expected to get a better estimate than$\sqrt { n }$ of the truth - k-wise independend : familiy of hash
$H = { h : [ n ] \rightarrow U }$ if for any$k$ distinct$\left( x _ { 1} ,\cdots ,x _ { k } \right) \in U ^ { k }$ and any$\left( u _ { 1} ,\cdots ,u _ { k } \right)$ ,$P r _ { h \in H } \left[ h \left( x _ { 1} \right) = u _ { 1} \wedge \cdots \wedge h \left( x _ { k } \right) = u _ { k } \right] = \left( \frac { 1} { | U | } \right) ^ { k }$ - AMS
$\ell_2$ frequency vector norm- algo
- init : pick 4-wise independent hash function
$h : [ n ] \rightarrow { - 1,+ 1}$ , let$\sigma_i=h(i)$ so$\sigma \in { - 1,1} ^ { n }$ and$Z=0$ - process : element of value
$i$ as$Z = Z + \sigma _ { i }$ - output :
$Z^2$ (we have$Z = \sum _ { i = 1} ^ { n } f _ { i } \sigma _ { i }$ )
- init : pick 4-wise independent hash function
- analysis
- unbiased :
$\mathbb { E } \left[ Z ^ { 2} \right] = \mathbb { E } \left[ \sum _ { i \in [ n ] } \sigma _ { i } ^ { 2} f _ { i } ^ { 2} \right] + \mathbb { E } \left[ \sum _ { i \neq j \in [ n ] } \sigma _ { i } \sigma _ { j } f _ { i } f _ { j } \right]= \mathbb { E } \left[ \sum _ { i \in [ n ] } f _ { i } ^ { 2} \right] + \sum _ { i \neq j \in [ n ] } \mathbb { E } \left[ \sigma _ { i } \right] \mathbb { E } \left[ \sigma _ { j } \right] f _ { i } f _ { j }= | f | _ { 2} ^ { 2}$ -
$Z ^ { 4} = \left( \sum _ { i \in [ n ] } \sigma _ { i } f _ { i } \right) \left( \sum _ { j \in [ n ] } \sigma _ { j } f _ { j } \right) \left( \sum _ { k \in [ n ] } \sigma _ { k } f _ { k } \right) \left( \sum _ { l \in [ n ] } \sigma _ { l } f _ { l } \right)= \sum _ { i \in [ n ] } f _ { i } ^ { 4} + 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}$ - all indexes equal : $\sum {i \in [ n ]}\sigma i ^4 f _ { i } ^ { 4} = \sum i \in [ n ] f _ { i } ^ { 4}$
- indexes matched 2 by 2 : $\left( \begin{array} { l } { 4} \ { 2} \end{array} \right) \sum _ { i < j } \left( \sigma _ { i } \sigma _ { j } f _ { i } f _ { j } \right) ^ { 2} = 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}$
- single unmatched multiplier : 4-wise independent give
$0$ as$\mathbb { E } \left[ \sigma _ { i } \right] = 0$ for any$1\le i\leq n$
- bounded variance :
$\operatorname{Var} \left[ Z ^ { 2} \right] = \sum _ { i } f _ { i } ^ { 4} + 6\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2} - \sum _ { i } f _ { i } ^ { 4} - 2\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2} = 4\sum _ { i < j } f _ { i } ^ { 2} f _ { j } ^ { 2}\leq 2\left( \sum f _ { i } ^ { 2} \right) ^ { 2}= 2| \text{f} | _ { 2} ^ { 4}$
- unbiased :
- improve precision : repeat iid
$t = \frac { 6} { \epsilon ^ { 2} }$ copies with output$Z _ { 1} ^ { 2} ,Z _ { 2} ^ { 2} \cdots Z _ { t } ^ { 2}$ and average$\tilde { Z } ^ { 2} = \frac { 1} { t } \sum _ { i = 1} ^ { t } Z _ { i } ^ { 2}$ - same mean by linearity
- smaller variance :
$\frac { \operatorname{Var} \left( Z ^ { 2} \right) } { t } \leq \frac { 2} { t } | \mathbf { f } | | _ { 2} ^ { 4}$ , by chebyshev$P r \left[ | \tilde { Z } ^ { 2} - | \mathbf { f } | _ { 2} ^ { 2} | > \epsilon | | \mathbf { f } | | _ { 2} ^ { 2} \right]\leq \frac { \left( \frac { 2} { t } \right) \cdot | | \text{f} | | _ { 2} ^ { 4} } { \epsilon ^ { 2} | f | _ { 2} ^ { 4} }\leq \frac { 2} { t \epsilon ^ { 2} }\leq \frac{1}{3}$
- algo
- nearest neighbor search problem : NNS, set of
$n$ points$P \subset \mathbb { R } ^ { d }$ , find$\min _ { p \in P } \operatorname{dist} ( p ,q )$ given$q \in \mathbb { R } ^ { d }$ - distance function : l2
- sublinear objective :
$O(\log n)$ time,$O(n)$ memory -
$k$ -d trees : partition space using coordinate-aligned planed, fail to beat naive looping when$d=\Omega(\log n)$ - curve of dimensionality : cube of side length
$a$ intersects$2^d$ other cells - reduction : approximate
$\operatorname{dist} ( p ,q ) \leq c \cdot \min _ { s \in P } \operatorname{dist} ( s ,q )$ with$c>1$ -
$ANNS(c,r)$ : build data structure s.t. for any$q$ , given$\exists p$ with$\operatorname{dis} t ( p ,q ) \leq r$ returns$\operatorname{dist} \left( p ^ { \prime } ,q \right) \leq c \cdot r$ - solve
$ANNS(c(1-\epsilon),r)$ for$r\in[\delta ,( 1+ \epsilon ) \delta ,( 1+ \epsilon ) ^ { 2} \delta ,\dots ,1]$ with minimum possible distance$\delta$ bit precision$1/\delta$ , report minimal value,$O(\log\frac{1}{\delta})$ time and space overhead
-
- locally sensitive hash LSH : hash function sensitive to distance, high probability to map to same point if
$\operatorname{dist} ( p ,q ) \leq r$ and different point if$\operatorname{dist} ( p ,q ) > c \cdot r$ - family of functions :
$\mathcal { H } \text{ is } \left( r ,c \cdot r ,p _ { 1} ,p _ { 2} \right) - \operatorname{L} S H$ -
$\mathcal { H }$ called$\left( r ,c \cdot r ,p _ { 1} ,p _ { 2} \right)$ -LSH : if for all$p$ ,$q$ $\operatorname{dist} ( p ,q ) \leq r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \geq p _ { 1}$ and$\operatorname{dist} ( p ,q ) \geq c \cdot r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \leq p _ { 2}$ , ideally$p _ { 1} \gg p _ { 2}$ - boost :
$p_1$ close to$1$ and$p_2$ to$\frac{1}{n}$ - example : L1 and
$h _ { i } ( p ) = p _ { i }$ $i$th bit of$p$ ,$\mathbb { P } [ h ( p ) = h ( q ) ] = \frac { # \text{ bits in common } } { \text{ total bits } } = \frac { d - | p - q | _ { 1} } { d } = 1- \frac { | p - q | _ { 1} } { d }$ is$\left( c ,c \cdot r ,e ^ { - r / d } ,e ^ { - c \cdot r / d } \right)$ -LSH - make
$p_2$ small :$k$ independent hash function,$\operatorname{dist} ( p ,q ) \geq c \cdot r \Rightarrow \mathbb { P } [ h ( p ) = h ( q ) ] \leq p _ { 2} ^ { k }$ for$h ( p ) = \left[ h _ { 1} ( p ) ,\ldots ,h _ { k } ( p ) \right]$ , choose$\ell$ independent copies of$h$ $f _ { \ell } ( p ) = \left[ h _ { \ell ,1} ( p ) ,\dots ,h _ { \ell ,k } ( p ) \right]$ s.t.$\mathbb { P } \left[ \exists i | f _ { i } ( p ) = f _ { i } ( q ) \right]\geq 1- \left( 1- p _ { 1} ^ { k } \right) ^ { \ell }$ , choose$k$ s.t.$p_2^k=1/n$ , choose$\ell \propto n ^ { \rho } \ln n$ - time overhead :
$O(\ell)$ maps to same value as$\mathbb { E } \left[ # p : \operatorname{dist} ( p ,q ) > c \cdot r ,f _ { i } ( p ) = f _ { i } ( q ) \right] \leq n \cdot p _ { 2} ^ { k } \leq 1$
- time overhead :
- make
$p_1$ close to$1$ : if$\operatorname{dist} ( p ,q ) \leq r$ ,$\mathbb { P } \left[ \exists i : f _ { i } ( p ) = f _ { i } ( q ) \right] \geq 1- \left( 1- p _ { 1} ^ { k } \right) ^ { \ell } = 1- \left( 1- p _ { 2} ^ { \rho k } \right) ^ { \ell } = 1- \left( 1- n ^ { - \rho } \right) ^ { \ell } \approx 1- e ^ { \ell n ^ { - \rho } } = 1- 1/ n$ - assume :
$p _ { 1} = p _ { 2} ^ { \rho }$ with$\rho <1$ determines running time/memory - algo
- complexity : memory
$O \left( n ^ { 1+ \rho } \right)$ and time$O \left( n ^ { \rho } \right)$ - space : maintains
$O(\ell)$ hash tables, store$n = | P |$ hash value of$k$ dimensional vector,$O ( \ell \cdot n \cdot k ) = O \left( n ^ { 1+ \rho } \ln n \frac { \log n } { \log \left( 1/ p _ { 2} \right) } \right)$ - time : compute
$f_i(q)$ ,$dist(p,q)$ ,$O ( d ( \ell + | O | ) + \ell \cdot k ) = O \left( n ^ { \rho } \ln ( n ) \left( d + \frac { \log n } { \log \left( 1/ p _ { 2} \right) } \right) + d \right)$ with$|O|$ number of points at distance$c\cdot r$ from$q$
- space : maintains
- family of functions :
- set function :
$f : 2^ { N } \rightarrow \mathbb { R }$ assume real value to every subset$S \subseteq N$ - marginal contribution
$u$ to$S$ :$f ( u | S ) = f ( S \cup { u } ) - f ( S )$ - submodular :
$f ( A ) + f ( B ) \geq f ( A \cup B ) + f ( A \cap B )$ for all$A ,B \subseteq N$ or iff$f ( u | A ) \geq f ( u | B )$ for all$A \subseteq B \subseteq N$ and each$u \in N \setminus B$ , diminishing returns, discrete derivation non-increasing analog to concavity-
$\Rightarrow$ :$f ( A \cup { u } ) + f ( B ) \geq f ( A \cup { u } \cup B ) + f \left( \left( A \cup { u } \right) \cap B \right)=f ( { u } \cup B ) + f ( A )$ -
$\Leftarrow$ :$f ( D ) - f ( C \cap D ) = \sum _ { i = 1} ^ { h } f \left( ( C \cap D ) \cup D _ { i } \right) - f \left( ( C \cap D ) \cup D _ { i - 1} \right) \geq \sum _ { i = 1} ^ { h } f \left( d _ { i } | \left( C \cup D _ { i - 1} \right) \right) = f ( C \cup D ) - f ( C )$ with$C ,D \subseteq N$ ,$D \backslash C = \left{ d _ { 1} ,d _ { 2} ,\dots ,d _ { h } \right}$ and$D _ { i } = \left{ d _ { j } : 1\leq j \leq i \right}$
-
- examples
- size of graph cut :
$\delta ( S ) = | { ( u ,v ) : u \in S ,v \in V\setminus S } |$ with $ S\subseteq V$,$\delta ( v | S ) = E ( v ,V | S \cup { v } ) ) - E ( v ,S )$ - union count :
$c ( S ) = | \cup _ { i \in S } T _ { i } |$ with collection of sets$T _ { 1} ,T _ { 2} \dots ,T _ { n }$ s.t.$T _ { i } \subseteq \mathbb { N }$ ,$c ( i | S ) = | T _ { i } \cup \cup _ { j \in S } T _ { j } |- | \cup _ { j \in S } T _ { j } | =| T _ { i } \backslash \cup _ { j \in S } T _ { j } |$ - maximal independent matroid :
$r ( A \cup B ) + r ( A \cap B ) = | D | + | C | \leq | D \cap ( A \cup B ) | + | D \cap ( A \cap B ) | = | D \cap B | + | D \cap A | \leq r ( B ) + r ( A )$ - text summaries
- influence maximization
- entropy :
$H(A)=- \sum _ { a } \mathbb { P } \left[ \left{ x _ { i } \right} _ { i \in A } = a \right] \log \left( \mathbb { P } \left[ \left{ x _ { i } \right} _ { i \in A } = a \right] \right)$
- size of graph cut :
- submodular minimization :
$f ( S ) = \min _ { S ^ { \prime } \subseteq N } f \left( S ^ { \prime } \right)$ - ellipsoid method : decide whether
$x$ optimal or compute subgradient$\nabla g ( x )$ in polytime subject to poly many inequality
- ellipsoid method : decide whether
- Lovasz extension :
$\hat { f } ( z ) = \mathbb E _ { \lambda \sim [ 0,1] } \left[ f \left( \left{ i : z _ { i } \geq \lambda \right} \right) \right]$ uniformly for every$z \in [ 0,1] ^ { n }$ - explicit :
$z _ { 0} = 1\geq z _ { 1} \geq z _ { 2} \geq \dots \geq z _ { n } \geq 0= z _ { n + 1}$ ,$S _ { 0} = \emptyset \subseteq S _ { 1} \subseteq S _ { 2} \subseteq \cdots \subseteq S _ { n } = [ n ]$ , if$\lambda \in [ z_i ,z_{i + 1})$ (probability equal to$z _ { i } - z _ { i + 1}$ ) for$i \in [ n ] \cup { 0}$ then$\left{ j | z _ { j } \geq \lambda \right} = S _ { i }$ so$\hat { f } ( z ) = \sum _ { i = 0} ^ { n } \left( z _ { i } - z _ { i + 1} \right) f \left( S _ { i } \right)$ - assuming constant time evaluation oracle
-
$\hat f$ convex iff$f$ submodular : only "if" part by LP duality- assume :
$f ( 0) = 0$ or define$f ^ { \prime } = f - f ( 0 )$ - primal :
$\hat f(z)=g ( z ) = \max _ { x } z ^ { T } x$ s.t.$\sum _ { i \in S } x _ { i } \leq f ( S )$ $\forall S \subseteq N$ and$\sum _ { i \in N } x _ { i } = f ( N )$ with$g(z)$ convex - maximum of linear function over convex set always convex :
$g \left( \lambda z _ { 1} + ( 1- \lambda ) z _ { 2} \right) = \max _ { x } \left( \lambda z _ { 1} ^ { T } x + ( 1- \lambda ) z _ { 2} ^ { T } x \right) \leq \lambda \max _ { x } z _ { 1} ^ { T } x + ( 1- \lambda ) \max _ { x } z _ { 2} ^ { T } x =\lambda g \left( z _ { 1} \right) + ( 1- \lambda ) g \left( z _ { 2} \right)$ - dual :
$\lambda g \left( z _ { 1} \right) + ( 1- \lambda ) g \left( z _ { 2} \right)$ s.t.$\sum _ { S \geq i } y _ { S } = z _ { i }$ $\forall i \in N$ and$y _ { 5} \geq 0$ $\forall S \subset N$ - weak duality : enforce
$z ^ { T } x \leq \sum _ { S \subset N } y _ { S } f ( S )$ - want to guess optimal solutions : verify optimality as any feasible optimal guess satisfies
$z ^ { T } x = \sum _ { S \subset N } y _ { S } f ( S )$ - assume :
$z_0=1\geq z _ { 1} \geq z _ { 2} \geq \ldots \geq z _ { n } \geq 0= z _ { n + 1}$ ,$S_i={ 1,2,\dots ,i }$ - define :
$x _ { i } ^ { * } = f \left( S _ { i } \right) - f \left( S _ { i - 1} \right)$ and $y _ { S } ^ { * } = \left{ \begin{array} { l l } { z _ { i } - z _ { i + 1} } & { \text{ for } S = S _ { i } \text{ with } i \in [ n ] } \ { 0} & { \text{ otherwise } } \end{array} \right.$ - claim
$z ^ { T } x ^ { * } = \hat { f } ( z )= \sum _ { i = 1} ^ { n } z _ { i } \cdot \left( f \left( S _ { i } \right) - f \left( S _ { i - 1} \right) \right)= \sum _ { i = 0} ^ { n } \left( z _ { i } - z _ { i + 1} \right) f \left( S _ { i } \right) = \sum _ { S \subseteq N } y _ { S } ^ { * } f ( S )$ -
$y^\star$ dual feasible : all non-negative entries$\sum _ { S : S \geq i } y _ { S } ^ { * } = \sum _ { j \geq i } y _ { S _ { j } } ^ { * } = \left( z _ { i } - z _ { i + 1} \right) + \left( z _ { i + 1} - z _ { i + 2} \right) + \cdots + \left( z _ { n - 1} - z _ { n } \right) + \left( z _ { n } - z _ { n + 1} \right) = z _ { i } - z _ { n + 1} = z _ { i }$ -
$x^*$ dual feasible : need$\sum _ { i \in S } x _ { i } ^ { * } \leq f ( S )$ by induction rearrange$f ( S ) + f \left( S _ { i - 1} \right) \geq f \left( S \cup S _ { i - 1} \right) + f ( S \cap S _ { i - 1} ) = f \left( S _ { i } \right) + f ( S \setminus { i } )$ into$f ( S ) \geq f \left( S _ { i } \right) - f \left( S _ { i - 1} \right) + f ( S \backslash { i } ) = x _ { i } ^ { * } + f ( S \backslash { i } ) \geq \sum _ { i \in S } x _ { i } ^ { * }$
- assume :
- convex closure : by strong duality,
$f ^ { - } : [ 0,1] ^ { n } \rightarrow \mathbb { R }$ s.t if$D ^ { - } ( z ) \text{ for } z \in [ 0,1] ^ { n }$ is distribution on subsets with margin probabilities given by z ($P r _ { S \sim D - ( z )} [ u \in S ] = z _ { u }$ ) then $f ^ { - } ( z ) = \mathbb { E }_ { S \sim D ^ -{ ( z ) } } [ f ( S ) ] $ which is minised
- explicit :
- marginal contribution
- cardinality constrained monotone maximization
- assume : all functions
$f : 2^ { N } \rightarrow \mathbb { R }$ non-negative, monotone$f ( S ) \leq f ( T );\forall S \subseteq T \subseteq N$ and normalized$f ( \emptyset ) = 0$ - constrained : find set
$S\subseteq N$ of size at most$k$ - generalize greedy to submodular : choose
$u\not\in S$ with maximal marginal gain$f(u\mid S)$ , suboptimal- if
$f ( S ) = \sum _ { e \in S } w _ { e }$ : standard greedy - constant factor approximation :
$f(S)\ge ( 1- 1/ e ) f ( O) \approx 0.632f ( O)$ - claim :
$\sum_{e \in T \backslash S} f ( e | S ) \geq f ( T \cup S ) - f ( S )$ for$S ,T \subseteq N$ as elements of$T \backslash S$ can be ordered $e _ { 1} ,e _ { 2} ,\dots ,e_{| T \setminus S |} $ and let$T_i$ set containg first$i$ elements of this ordering, we have$T _ { 0} = \emptyset$ ,$T _ { | T | S | } = T \backslash S$ , and$\sum _ { e \in T \backslash S } f ( e | S ) = \sum _ { i = 1} ^ { T \backslash S } f \left( e _ { i } | S \right) \geq \sum _ { i = 1} ^ { T \backslash S } f \left( e _ { i } | S \cup T _ { i - 1} \right)= \sum _ { i = 1} ^ { T \backslash S } f \left( T _ { i } \cup S \right) - f \left( T _ { i - 1} \cup S \right) =\ \sum _ { i = 1} ^ { T \backslash S } f \left( T _ { i } \cup S \right) - f \left( T _ { i - 1} \cup S \right) $ - proof : since
$f$ monotone, assume$| O | = k$ as we can add elements without decreasing value, let$S_i$ be$\left{ u _ { j } : j \leq i \right}$ containing first$i$ elements selected, we have$S _ { 0} = \emptyset$ and$S _ { k } = S$ , thus$k \cdot f \left( u _ { i } | S _ { i - 1} \right) \geq | O \setminus S _ { i - 1} | \cdot f \left( u _ { i } | S _ { i - 1} \right) \geq \sum _ { e \in O \setminus S _ { i - 1} } f \left( e | S _ { i - 1} \right) \geq f \left( S _ { i - 1} \cup O \right) - f \left( S _ { i - 1} \right) \geq f ( O ) - f \left( S _ { i - 1} \right)$ which give$f \left( S _ { i } \right) \geq \left( 1- \frac { 1} { k } \right) f \left( S _ { i - 1} \right) + \frac { 1} { k } f ( O )$ (induction) and$\left( 1- \frac { 1} { k } \right) ^ { k } \leq e ^ { - 1}$ since$1+ x \leq e ^ { x }$
- claim :
- if
- can be generalized to any matroid constraints
- assume : all functions
- unconstrained submodular maximization : greedy from two sides
-
$1/3$ -approximation- claim : for every $1\leq i \leq n $
$a _ { i } + b _ { i } \geq 0$ as$\left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) \cap \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = X _ { i - 1}$ and$\left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) \cup \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = Y _ { i - 1}$ for$X _ { i - 1} \subseteq Y _ { i - 1}$ and$u _ { i } \in Y _ { i - 1}$ thus$a _ { i } + b _ { i } = f \left( X _ { i - 1} \cup \left{ u _ { i } \right} \right) + f \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) - \left[ f \left( X _ { i - 1} \right) + f \left( Y _ { i - 1} \right) \right] \ge 0$ - claim :
$[ f \left( X _ { i } \right) + f \left( Y _ { i } \right) ] - \left[ f \left( X _ { i - 1} \right) + f \left( Y _ { i - 1} \right) \right] \geq f \left( O P T _ { i - 1} \right) - f \left( O P T _ { i } \right)$ for$O P T _ { i } = \left( O P T \cup X _ { i } \right) \cap Y _ { i }$ , we have in case $a_1\ge b_i $$X _ { i } = X _ { i - 1} \cup \left{ u _ { i } \right}$ ,$Y _ { i } = Y _ { i - 1}$ with$f \left( u _ { i } | X _ { i - 1} \right) = a _ { i } \geq 0$ , when$f \left( u _ { i } | X _ { i - 1} \right) = a _ { i } \geq 0$ give$f \left( O P T _ { i } \right) - f \left( O P T _ { i - 1} \right) = f \left( u _ { i } | O P T _ { i - 1} \right) \geq f \left( u _ { i } | ( Y _ { i - i } \setminus \left{ u _ { i } \right} \right) )= f \left( Y _ { i - 1} \right) - f \left( Y _ { i - 1} \backslash \left{ u _ { i } \right} \right) = - b _ { i } \geq - a _ { i }$ - proof :
$\sum _ { i = 1} ^ { n } \left[ f \left( O P T _ { i - 1} \right) - f \left( O P T _ { i } \right) \right] \leq \sum _ { i = 1} ^ { n } \left[ f \left( X _ { i } \right) + f \left( Y _ { i } \right) - f \left( X _ { i - 1} \right) - f \left( Y _ { i - 1} \right) \right]$ give$f \left( O P T _ { 0} \right) - f \left( O P T _ { n } \right) \leq f \left( X _ { n } \right) + f \left( Y _ { n } \right) - f \left( X _ { 0} \right) - f \left( Y _ { 0} \right) \leq f \left( X _ { n } \right) + f \left( Y _ { n } \right)$ which results in$f ( O P T ) \leq 3\cdot f \left( X _ { n } \right)$
- claim : for every $1\leq i \leq n $
-
- difference from streaming algo : online algo need a decision before seen all the stream
- competitive ratio : ratio of cost to best cost assuming knowledge of everything in advance
$\max _ { I } \frac { A L G ( I ) } { O P T ( I ) }$ , for maximization$\min _ { I } \frac { A L G ( I ) } { O P T ( I ) }$ - alternative definition :
$A L G ( I ) \leq r \cdot O P T ( I ) + c$ with$c$ constant and$r$ competitive ration, when$c=0$ ,$r$ called strong competitive ration
- alternative definition :
- rent or buy skis :
$R$ or$B$ , rent until$B$ reached, then buy, give competitive ration of$2$ - monotone submodular function maximizations with cardinality constraint :
$poly(k)$ memory independent of the stream length- examples : document summarization (
$k$ out of many documents), influence maximization (give away$k$ product) - assuming : know
$\text {OPT} = \max _ { S \subseteq N : | S | \leq k } f ( S )$ , we have$\mathrm { OPT } / 2 \leq f ( S )$ - proof : ordered elements
$e _ { 1 } , e _ { 2 } , \ldots , e _ { n }$ , intermediate set$S_i$ and$S=S_n$ - case
$|S|=k$ : select only items with marginal contribution greater,$f ( S ) = f \left( \left{ e _ { i _ { 1 } } , e _ { i _ { 2 } } , \dots , e _ { i _ { k } } \right} \right)\geq k \cdot \frac { \mathrm { OPT } } { 2 k } = \frac { \mathrm { OPT } } { 2 }$ - case
$|S|<k$ : items not selected,$f ( e | S ) < O P T / ( 2 k )$ , optimal solution ordering$\left{ o _ { 1 } , o _ { 2 } , \ldots , o _ { k ^ { \prime } } \right} = O \backslash S$ , by monotonicity$\mathrm { OPT } \leq f ( S \cup O )=f ( S ) + f ( O | S ) = f ( S ) + \sum _ { i = 1 } ^ { k ^ { \prime } } f \left( \left{ o _ { i } \right} | S \cup \left{ o _ { 1 } , \ldots , o _ { i - 1 } \right} \right)\leq f ( S ) + \sum _ { i = 1 } ^ { k ^ { \prime } } f \left( \left{ o _ { i } \right} | S \right)\leq f ( S ) + k ^ { \prime } \cdot \frac { \mathrm { OPT } } { 2 k } $
- case
- examples : document summarization (
- caching
- assume : cache miss bring memory page in cache
- deterministic :
$k$ cache size- least recently used (LRU) : competitive ratio
$k$ , divide request sequence into phases, phase àià beings at first time$k$ -th disinct page seen after phase$i-1$ began,$OPT$ makes at least one cache miss per new phase, LRU at most$k$ per phase - first in first out (FIFO) : competitive ratio
$k$ - least frequently used (LFU) : unbounded competitive ratio
- last in first out (LIFO) : unbounded competitive ratio
- cannot do better than
$k$ deterministic : look at cache content and request another page, this page will be request no sooner that$k$ request
- least recently used (LRU) : competitive ratio
- randomized : marking algo, each page has 1-bit marked when page recently used
- competitive ratio :
$2 H _ { k } = 2 \cdot \left( \frac { 1 } { 1 } + \frac { 1 } { 2 } + \ldots + \frac { 1 } { k } \right) = O ( \log k )$ - no randomized algo has better than
$H_k$
- competitive ratio :
- secretary problem : random arrival order,
$n$ unique candidates, hire or not at arrival (1 position only)- good strategy : exploring first
$n/2$ candidates, take first better than first half,$Pr[\text { hiring best candidate } ] \geq \text { Pr } [ \text { Second best among first } n / 2 ] \cdot \operatorname{Pr} [ \text { best candidate in } 2 \text { nd half } ] \geq ( 1 / 2 ) ( 1 / 2 ) = 1 / 4$ - bad strategy : hiring first,
$\operatorname{Pr} [ \text { hiring best candidate } ] = 1 / n$ - optimal strategy : start selecting after
$r-1$ candidates,$\operatorname{Pr} [ \text { hiring best candidate } ] = \sum _ { i = 1 } ^ { r - 1 } 0 + \frac { 1 } { n } \sum _ { i = r } ^ { n } \mathrm { P } \mathrm { r } \text { [2nd best of the first i applicants is in first } r - 1 | i \text { best } ]= \frac { 1 } { n } \sum _ { i = r } ^ { n } \frac { r - 1 } { i - 1 }$ , optimizing$\frac { r - 1 } { n } \cdot \sum _ { i = r } ^ { n } \frac { 1 } { i - 1 } \approx \frac { r } { n } \int _ { r } ^ { n } \frac { 1 } { x } d x = \frac { r } { n } \cdot \ln ( n / r )$ gives$r=n/e$ with prob$1/e$
- good strategy : exploring first
- spectral graph theory : studies how eigenvalues and eigenvectors of adjaceny matrix relate to combinatorial properties
- adjency matrix :
$G=(V,E)$ of$|V|=n$ is matrix$\mathbb{R}^{n\times n}$ with$A_{ij}=1$ iff${i,j}\in E$ -
$d$ -regular : degree of each vertex is$d$ - eigenvector
$v$ with eigenvalue$\lambda$ :$M v = \lambda v$ , basis - normalized adjency of
$d$ -regular :$\frac{1}{d}A$ , call random walk matrix- symmetric :
$n$ distinct real eigenvalues$\lambda _ { 1 } \geq \lambda _ { 2 } \geq \dots \geq \lambda _ { n }$ , orgonal - signal
$x$ on graph :$y=Mx$ give$y ( i ) = \sum _ { { i , j } \in E } \frac { x ( j ) } { d }$ which is average value according to$x$ of$v$ 's neighbours-
$\lambda_1\le 1$ :$y ( i ) = \sum _ { { i , j } \in E } \frac { x ( j ) } { d } \leq \sum _ { { i , j } \in E } \frac { x ( i ) } { d } = x ( i )$ for$x(i)$ maximized -
$\lambda _ { 2 } = 1 \Leftrightarrow \text { G is disconnected}$ - suppose disconnected : define $v _ { 2 } ( i ) = \left{ \begin{array} { l l } { 1 / | S | } & { \text { if } i \in S } \ { - 1 / | V \backslash S | } & { \text { if } i \in V \backslash S } \end{array} \right.$ perpendicular to all
$1$ 's vector$v_1$ , we have$y ( i ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( j ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( i ) = v _ { 2 } ( i )$ hence$M v _ { 2 } = v _ { 2 }$ - suppose connected :
$v _ { 2 } \perp v _ { 1 }$ but at least$v _ { 2 } ( i ) \neq v _ { 2 } ( j )$ hence one$j^\star$ s.t.$v _ { 2 } ( i ) > v _ { 2 } \left( j ^ { * } \right)$ and$y ( i ) = \frac { 1 } { d } \sum _ { { j , i } \in E } v _ { 2 } ( j ) \leq \frac { 1 } { d } \left( \sum _ { { j , i } \in E : j \neq j ^ { * } } v _ { 2 } ( i ) + v _ { 2 } \left( j ^ { * } \right) \right) < v _ { 2 } ( i )$
- suppose disconnected : define $v _ { 2 } ( i ) = \left{ \begin{array} { l l } { 1 / | S | } & { \text { if } i \in S } \ { - 1 / | V \backslash S | } & { \text { if } i \in V \backslash S } \end{array} \right.$ perpendicular to all
$\lambda _ { n } = - 1 \Leftrightarrow \text { one component of } G \text { is bipartite }$
-
- number of connected component :
$| \left{ i | \lambda _ { i } = 1 \right} |$
- symmetric :
- mixing time :
$M^kp$ , connected or never reach uniform distribution, in case of bipartite again same issue- convergence :
$| M ^ { k } p - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } \leq o \left( \frac { 1 } { n ^ { 2 } } \right)$ if$\max \left( | \lambda _ { 2 } | , | \lambda _ { n } | \right) \leq 1 - \epsilon$ after$k = \frac { c } { \epsilon } \log n$ or$O \left( \frac { 1 } { \epsilon } \log n \right)$ steps at any vertex with prob$\approx \frac { 1 } { n }$ - proof : basis
$p = \sum _ { i = 1 } ^ { n } \alpha _ { i } v _ { i }$ with$\alpha _ { i } = \left\langle p , v _ { i } \right\rangle$ and$\alpha _ { 1 } = \left\langle p , v _ { 1 } \right\rangle = \frac { 1 } { \sqrt { n } }$ we have$| M ^ { k } x - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } = | \sum _ { i = 2 } ^ { n } \alpha _ { i } \lambda _ { i } ^ { k } v _ { i } | _ { 2 }= \sum _ { i = 2 } ^ { n } | \lambda _ { i } | ^ { k } | \alpha _ { i } v _ { i } | _ { 2 }\leq ( 1 - \epsilon ) ^ { k } \sum _ { i = 2 } ^ { n } | \alpha _ { i } v _ { i } | _ { 2 }= ( 1 - \epsilon ) ^ { k } | | \sum _ { i = 2 } ^ { n } \alpha _ { i } v _ { i } | | _ { 2 }$ since$| \lambda _ { i } | \leq ( 1 - \epsilon )$ , take$k = \frac { c } { \epsilon } \log n$ and$A = | \sum _ { i = 2 } ^ { n } \alpha _ { i } v _ { i } | _ { 2 } \leq 1$ give$| M ^ { k } x - \left( \frac { 1 } { n } , \ldots , \frac { 1 } { n } \right) | _ { 2 } \leq A ( 1 - \epsilon ) ^ { \frac { \log n } { \epsilon } c } \leq A \left( \frac { 1 } { e } \right) ^ { c \log n } \leq \frac { 1 } { n ^ { c } }$
- convergence :
- connectivity : more balanced cut might be better than optimal silly one
- conductance
- cut : cut
$S\subseteq V$ ,$h ( S ) = \frac { | \delta ( S ) | } { d \cdot \min { | S | , | V \setminus S | } }$ with set of edges crossing$\delta(S)$ , rough measures fraction of edges that leaves$S$ - graph :
$h ( G ) = \min _ { S \subset V } h ( S )$
- cut : cut
- Cheeger's inequalities :
$\frac { 1 - \lambda _ { 2 } } { 2 } \leq h ( G ) \leq \sqrt { 2 \left( 1 - \lambda _ { 2 } \right) }$ - algo spectral-partitioning : very fast
$O ( | V | \log | V | + | E | )$ ,$v_2$ can be approximated - Rayleigh quotient :
$\frac { x ^ { T } M x } { x ^ { T } x }$ - first eigenvalue :
$\lambda _ { 1 } = \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$ -
$\lambda _ { 1 } \leq \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$ :$\frac { v _ { 1 } ^ { T } M v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \frac { v _ { 1 } ^ { T } \lambda _ { 1 } v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \lambda _ { 1 } \frac { v _ { 1 } ^ { T } v _ { 1 } } { v _ { 1 } ^ { T } v _ { 1 } } = \lambda _ { 1 }$ -
$\lambda _ { 1 } \geq \max _ { x \in \mathbb { R } ^ { n } } \frac { x ^ { T } M x } { x ^ { T } x }$ :$\frac { y ^ { T } M y } { y ^ { T } y } = \frac { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } \lambda _ { i } } { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } \leq \lambda _ { 1 } \frac { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } { \sum _ { i = 1 } ^ { n } \alpha _ { i } ^ { 2 } } = \lambda _ { 1 }$
-
- second eigenvalue :
$\lambda _ { 2 } = \max _ { x \in \mathbb { R } ^ { n } : x \perp x _ { 1 } } \frac { x ^ { T } M x } { x ^ { T } \cdot x }$
- first eigenvalue :
- proof
$\frac { 1 - \lambda _ { 2 } } { 2 } \leq h ( G )$ :$1 - \lambda _ { 2 } = \min _ { x \in \mathbb { R } ^ { n } : x \perp v _ { 1 } } \frac { x ^ { T } ( I - M ) x } { x ^ { T } x }$ give Laplacian with$= \min _ { x \in \mathbb { R } ^ { n } : x \perp v _ { 1 } } \frac { x ^ { T } ( I - M ) x } { x ^ { T } x }$ looks like contiunous relaxation of cut problem, assume$| S | \leq | V | / 2$ and define $y ( i ) = \left{ \begin{array} { l l } { ( 1 - | S | / | V | ) } & { \text { if } i \in S } \ { - | S | / | V | } & { \text { if } i \notin S } \end{array} \right.$ we have$\sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } = \sum _ { { i , j } \in E } ( 1 - | S | / V | + | S | / | V | ) ^ { 2 } = | \delta ( S ) |$ and $y ^ { T } y = \sum _ { i \in V } y ( i ) ^ { 2 } = | S | ( 1 - | S | / | V | ) ^ { 2 } + ( | V | - | S | ) ( | S | / | V | ) ^ { 2 } = \frac { | S | | V | S | } { | V | } \geq | S | / 2 $ so$1-\lambda_2\leq \frac { \sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } } { d \cdot y ^ { T } y }$ and$\frac { 1 - \lambda _ { 2 } } { 2 } \leq \frac { 1 } { 2 } \frac { \sum _ { { i , j } \in E } ( y ( i ) - y ( j ) ) ^ { 2 } } { d \cdot y ^ { T } y } \leq \frac { 1 } { 2 } \frac { | \delta ( S ) | } { d \cdot | S | / 2 } = \frac { | \delta ( S ) | } { d \cdot | S | } = h ( S )$
- algo spectral-partitioning : very fast