-
Notifications
You must be signed in to change notification settings - Fork 12
21 ann.loss
Related with module require("aprilann.ann.loss")
.
This package defines the loss functions included in APRIL-ANN. All loss functions share the same interface. Normally, they are implemented in C++ and binded to Lua.
The interface of loss functions is the following:
-
loss,loss_matrix = loss:compute_loss(input,target)
: this method computes the loss between twotokens
, theinput
and thetarget
. Normally they are bi-dimensionalmatrix
instances with size NxM, where N is the number of patterns in the bunch (mini-batch), and M is the number of outputs in the ANN component. The method returns two values, theloss
, which is a number with the mean loss in the given bunch of patterns. Theloss_matrix
, which is a one-dimensionalmatrix
of size N containing the loss for every pattern. In some cases, as for example in FMeasure-based loss functions, this lossmatrix
is of size 1, because the loss function is computed over the bunch of patterns, and is not separable. -
gradient = loss:gradient(input,target)
: this method computes the gradient of the loss between the twoinput
and thetarget
. It returns a bi-dimensional matrix with size NxM. Each component of thismatrix
is the partial derivative ANN outputs respect to the loss function. -
loss,loss_matrix = loss:accum_loss(loss,loss_matrix)
: this method receives the output ofcompute_loss
method, and accumulates the given loss in its internal state. It is useful to compute the loss of a large number of patterns. -
loss_matrix = loss:accum_loss(loss_matrix)
: this method is a specialization of the previous one, but receiving only theloss_matrix
. -
mean,variance = loss:get_accum_loss()
: this method returns two numbers, themean
and thevariance
of the accumulated loss in the internal state of the loss function object.
Tt is possible to develop new loss functions by implementing Lua classes derived
from ann.loss
class, following this example:
> myloss,myloss_methods = class("myloss",ann.loss)
> function myloss:constructor()
-- Your code to initialize self reference
end
> function myloss_methods:compute_loss(input,target)
-- YOUR CODE
return loss,loss_matrix
end
> function myloss_methods:gradient(input,target)
-- YOUR CODE
return gradient_token
end
> function myloss_methods:accum_loss(loss,loss_matrix)
local loss_matrix = loss_matrix or loss
-- YOUR CODE
return loss or loss_matrix, loss_matrix
end
> function myloss_methods:get_accum_loss()
-- YOUR CODE
return loss_mean,loss_variance
end
> function myloss_methods:reset()
-- YOUR CODE
end
> function myloss_methods:clone()
-- YOUR CODE
return cloned_obj
end
This loss function is defined at the object ann.loss.mse
:
> loss = ann.loss.mse()
The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.
This loss function computes the mean squared error between the given input/target patterns. It computes the following expression:
Where N
is the number of patterns, h_i^j
is the position (i,j)
in the input matrix
(pattern i
, component
j
), and t_i^j
is the same position at the target matrix.
This loss function is defined at the object ann.loss.mae
:
> loss = ann.loss.mae()
The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.
This loss function computes the mean absolute error between the given input/target patterns. It computes the following expression:
Where N
is the number of patterns, M
is the number of outputs,
h_i^j
is the position (i,j)
in the input matrix (pattern i
, component
j
), and t_i^j
is the same position at the target matrix.
This loss function is defined at the object ann.loss.cross_entropy
:
> loss = ann.loss.cross_entropy()
The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.
This object is implemented to work only with log_logistic
activation
function. This loss function computes the cross entropy between the given
input/target patterns, interpreting the ANN component output as a binomial
distribution. It computes the following expression:
Where N
is the number of patterns, h_i^j
is the position (i,j)
in the input matrix (pattern i
,
component j
, in natural scale), and t_i^j
is the same position at the target
matrix.
This loss function is defined at the object ann.loss.multi_class_cross_entropy
:
> loss = ann.loss.multi_class_cross_entropy()
The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.
This object is implemented to work only with log_softmax
activation
function. This loss function computes the cross entropy between the given
input/target patterns, interpreting the ANN component output as a
multinomial distribution. It computes the following expression:
Where N
is the number of patterns, h_i^j
is the position (i,j)
in the input matrix (pattern i
,
component j
, in natural scale), and t_i^j
is the same position at the target
matrix.
This loss function is defined at the object ann.loss.batch_fmeasure_macro_avg
:
> loss = ann.loss.batch_fmeasure_macro_avg{ beta=0.5 }
The constructor could receive an optional table parameter with the following fields:
-
size=0
: expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored. -
beta=1
: theb
parameter in the F-Measure expression below. By default it is set to 1. -
complement=false
: a boolean indicating if the input/target values must be computed complemented (1 - value), swapping positive and negative classes.
This object is implemented to work with logistic
or softmax
activation
function. This loss function computes the F-Measure between the given
input/target patterns, interpreting the ANN component output as a
multinomial distribution. It computes the following expression:
Where M
is the number of outputs,
b
is the beta parameter of the F-mesaure, h_j · t_j
is the dot
product between column vectors with input/target values of class j
, and
sum(o_j)
and sum(t_j)
is the sum of all the elements in the column vectors.
This loss function is defined at the object ann.loss.batch_fmeasure_micro_avg
:
> loss = ann.loss.batch_fmeasure_micro_avg{ beta=0.5 }
The constructor could receive an optional table parameter with the following fields:
-
size=0
: expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored. -
beta=1
: theb
parameter in the F-Measure expression below. By default it is set to 1. -
complement=false
: a boolean indicating if the input/target values must be computed complemented (1 - value), swapping positive and negative classes.
This object is implemented to work with logistic
activation function. This
loss function computes the F-Measure between the given input/target patterns,
interpreting the ANN component output as a binomial distribution. If it is
used with softmax
(multinomial distribution), then this function computes
accuracy. It follows this expression:
Where b
is the beta parameter of the F-mesaure, dot(h,t)
is the dot product
between the input/target matrices re-interpreted as two column vectors, and
sum(o)
and sum(t)
is the sum of all the elements in the matrices.
This loss function is defined at the object ann.loss.zero_one
:
> loss = ann.loss.zero_one([nil, [, 0.5 ] ])
The constructor could receive an optional first parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.
It could receive an optional second parameter which by default is 0.5
. This
second parameter is the threshold which defines when the output is taken as 1.
NOTE that if you are using log_logistic
outputs, this threshold must
be set to math.log(0.5)
. This parameter is only useful when the model has one output,
that is, for two-class problems.
This object is not derivable, so the compute_gradient
method is forbidden.
The loss function could be use to compute validation error, but not for
training. It computes the accuracy of the model classifying to the class with
maximum probability.