Skip to content

Commit

Permalink
restore softmax reg
Browse files Browse the repository at this point in the history
  • Loading branch information
astonzhang committed Sep 26, 2018
1 parent 63cfc93 commit 13b84bb
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions chapter_deep-learning-basics/softmax-regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,7 @@ $$H\left(\boldsymbol y^{(i)}, \boldsymbol {\hat y}^{(i)}\right ) = -\sum_{j=1}^q
假设训练数据集的样本数为$n$,交叉熵损失函数定义为
$$\ell(\boldsymbol{\Theta}) = \frac{1}{n} \sum_{i=1}^n H\left(\boldsymbol y^{(i)}, \boldsymbol {\hat y}^{(i)}\right ),$$

其中$\boldsymbol{\Theta}$代表模型参数。同样的,如果每个样本只有一个标号,那么交叉熵损失可以简写成$\ell(\boldsymbol{\Theta}) = -\frac 1n \sum_{i=1}^n \log (\hat{y}^{(i)})^{y^{(i)}}$。从另一个角度来看,我们知道最小化$\ell(\boldsymbol{\Theta})$,等价于最大化$-e^{\ell(\boldsymbol{\Theta})}=\prod_{i=1}^n (\hat{y}^{(i)})^{y^{(i)}}$,也就是说最小化交叉熵损失函数等价于最大化在对训练数据集所有标签类别的联合预测概率。

其中$\boldsymbol{\Theta}$代表模型参数。同样地,如果每个样本只有一个标签,那么交叉熵损失可以简写成$\ell(\boldsymbol{\Theta}) = -(1/n) \sum_{i=1}^n \log \hat y_{y^{(i)}}^{(i)}$。从另一个角度来看,我们知道最小化$\ell(\boldsymbol{\Theta})$等价于最大化$\exp(-n\ell(\boldsymbol{\Theta}))=\prod_{i=1}^n \hat y_{y^{(i)}}^{(i)}$,即最小化交叉熵损失函数等价于最大化训练数据集所有标签类别的联合预测概率。


## 模型预测及评价
Expand Down

0 comments on commit 13b84bb

Please sign in to comment.