Skip to content

Commit

Permalink
Add explanation of convergence
Browse files Browse the repository at this point in the history
  • Loading branch information
calvinytong committed Sep 24, 2018
1 parent 39b7951 commit b520a10
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 0 deletions.
51 changes: 51 additions & 0 deletions 2018-09-17-metalearning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,57 @@
" * At a high level: add some randomess to your actions, if your result was better than expected do more in the future repeat\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Why This Works?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Convergence](images/convergence.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a somewhat informal argument on why this method works. The argument is that the algorithms converge to a vector of parameters that is close in Euclidean distance to each task's manifold of optimal solutions. As such we define the problem as \n",
"\n",
"$$\n",
"\\begin{equation*}\n",
"\\begin{aligned}\n",
"\\underset{\\phi}{\\text{minimize}} && \\mathbb{E_\\tau} [\\frac{1}{2} D (\\phi, W_{\\tau})]\n",
"\\end{aligned}\n",
"\\end{equation*}\n",
"$$\n",
"\n",
"Where $W_{\\tau}$ is the set of optimal paramaters for a task $\\tau$ and D is the euclidean distance function. We introduce the $\\frac{1}{2}$ to make the math easier later. \n",
"\n",
"In order to deal with this, we have to review some math. Let's define a non-pathological set $S \\subset R^d$ and $\\phi \\in \\mathbb{R}^d$. Given that we are working with an appropriatly well behaved subset of $\\mathbb{R}^d$, the gradient of the squared distance function $D(\\phi, S)^2$ can be well approximated by $2(\\phi - proj_S(\\phi))$ (TODO: Flush this fact out more intuitivly, something having to do with distance being the min over all points in the set of absolute distances). Recall that this projection is just the closest value (in the euclidean sense) to the vector in the set. Once we have this approximation, it becomes clear that we can rewrite the gradient of our objective function as\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\nabla_\\phi \\mathbb{E_\\tau} [\\frac{1}{2} D (\\phi, W_{\\tau})] &= \\mathbb{E_\\tau} [\\frac{1}{2} \\nabla_\\phi D (\\phi, W_{\\tau})] \\\\\n",
"&= \\mathbb{E_\\tau} [\\phi - proj_{W_{\\tau}}(\\phi)]\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"So we can rewrite our gradient update as\n",
"$$\n",
"\\begin{aligned}\n",
"\\phi &\\leftarrow \\phi - \\alpha \\nabla_\\phi \\frac{1}{2} D (\\phi, W_{\\tau})^2 \\\\\n",
"&\\leftarrow \\phi - \\alpha (\\phi - proj_{W_{\\tau}}(\\phi)) \\\\\n",
"&\\leftarrow (1-\\alpha) \\phi - \\alpha proj_{W_{\\tau}}(\\phi)\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Even though we can't compute $proj_{W_{\\tau}}(\\phi)$ because it requires us to find the set of minimizers for the given task, we can approximate it with gradient decent. So, for each interation of reptile we see that we sample a task and replace $W_{\\tau}$ with the result of running k steps of gradient decent."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Binary file added images/convergence.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b520a10

Please sign in to comment.