Skip to content
Paco Zamora Martinez edited this page Nov 10, 2014 · 19 revisions

Related with module require("aprilann.ann.loss").

This package defines the loss functions included in APRIL-ANN. All loss functions share the same interface. Normally, they are implemented in C++ and binded to Lua.

The interface of loss functions is the following:

  • loss,loss_matrix = loss:compute_loss(input,target): this method computes the loss between two tokens, the input and the target. Normally they are bi-dimensional matrix instances with size NxM, where N is the number of patterns in the bunch (mini-batch), and M is the number of outputs in the ANN component. The method returns two values, the loss, which is a number with the mean loss in the given bunch of patterns. The loss_matrix, which is a one-dimensional matrix of size N containing the loss for every pattern. In some cases, as for example in FMeasure-based loss functions, this loss matrix is of size 1, because the loss function is computed over the bunch of patterns, and is not separable.

  • gradient = loss:gradient(input,target): this method computes the gradient of the loss between the two input and the target. It returns a bi-dimensional matrix with size NxM. Each component of this matrix is the partial derivative ANN outputs respect to the loss function.

  • loss,loss_matrix = loss:accum_loss(loss,loss_matrix): this method receives the output of compute_loss method, and accumulates the given loss in its internal state. It is useful to compute the loss of a large number of patterns.

  • loss_matrix = loss:accum_loss(loss_matrix): this method is a specialization of the previous one, but receiving only the loss_matrix.

  • mean,variance = loss:get_accum_loss(): this method returns two numbers, the mean and the variance of the accumulated loss in the internal state of the loss function object.

Tt is possible to develop new loss functions by implementing Lua classes derived from ann.loss class, following this example:

> myloss,myloss_methods = class("myloss",ann.loss)
> function myloss:constructor()
    -- Your code to initialize self reference
  end
> function myloss_methods:compute_loss(input,target)
    -- YOUR CODE
    return loss,loss_matrix
  end
> function myloss_methods:gradient(input,target)
    -- YOUR CODE
    return gradient_token
  end
> function myloss_methods:accum_loss(loss,loss_matrix)
    local loss_matrix = loss_matrix or loss
    -- YOUR CODE
    return loss or loss_matrix, loss_matrix
  end
> function myloss_methods:get_accum_loss()
    -- YOUR CODE
    return loss_mean,loss_variance
  end
> function myloss_methods:reset()
    -- YOUR CODE
  end
> function myloss_methods:clone()
    -- YOUR CODE
    return cloned_obj
  end

Mean squared error (MSE)

This loss function is defined at the object ann.loss.mse:

> loss = ann.loss.mse()

The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.

This loss function computes the mean squared error between the given input/target patterns. It computes the following expression:

MSE

Where N is the number of patterns, h_i^j is the position (i,j) in the input matrix (pattern i, component j), and t_i^j is the same position at the target matrix.

Mean absolute error (MAE)

This loss function is defined at the object ann.loss.mae:

> loss = ann.loss.mae()

The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.

This loss function computes the mean absolute error between the given input/target patterns. It computes the following expression:

MAE

Where N is the number of patterns, M is the number of outputs, h_i^j is the position (i,j) in the input matrix (pattern i, component j), and t_i^j is the same position at the target matrix.

Cross entropy

This loss function is defined at the object ann.loss.cross_entropy:

> loss = ann.loss.cross_entropy()

The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.

This object is implemented to work only with log_logistic activation function. This loss function computes the cross entropy between the given input/target patterns, interpreting the ANN component output as a binomial distribution. It computes the following expression:

CE

Where N is the number of patterns, h_i^j is the position (i,j) in the input matrix (pattern i, component j, in natural scale), and t_i^j is the same position at the target matrix.

Multi-class cross entropy

This loss function is defined at the object ann.loss.multi_class_cross_entropy:

> loss = ann.loss.multi_class_cross_entropy()

The constructor could receive an optional parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.

This object is implemented to work only with log_softmax activation function. This loss function computes the cross entropy between the given input/target patterns, interpreting the ANN component output as a multinomial distribution. It computes the following expression:

CE

Where N is the number of patterns, h_i^j is the position (i,j) in the input matrix (pattern i, component j, in natural scale), and t_i^j is the same position at the target matrix.

Macro averaging multi-class F-Measure

This loss function is defined at the object ann.loss.batch_fmeasure_macro_avg:

> loss = ann.loss.batch_fmeasure_macro_avg{ beta=0.5 }

The constructor could receive an optional table parameter with the following fields:

  • size=0: expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.

  • beta=1: the b parameter in the F-Measure expression below. By default it is set to 1.

  • complement=false: a boolean indicating if the input/target values must be computed complemented (1 - value), swapping positive and negative classes.

This object is implemented to work with logistic or softmax activation function. This loss function computes the F-Measure between the given input/target patterns, interpreting the ANN component output as a multinomial distribution. It computes the following expression:

FM

Where M is the number of outputs, b is the beta parameter of the F-mesaure, h_j · t_j is the dot product between column vectors with input/target values of class j, and sum(o_j) and sum(t_j) is the sum of all the elements in the column vectors.

Micro averaging multi-class F-Measure

This loss function is defined at the object ann.loss.batch_fmeasure_micro_avg:

> loss = ann.loss.batch_fmeasure_micro_avg{ beta=0.5 }

The constructor could receive an optional table parameter with the following fields:

  • size=0: expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.

  • beta=1: the b parameter in the F-Measure expression below. By default it is set to 1.

  • complement=false: a boolean indicating if the input/target values must be computed complemented (1 - value), swapping positive and negative classes.

This object is implemented to work with logistic activation function. This loss function computes the F-Measure between the given input/target patterns, interpreting the ANN component output as a binomial distribution. If it is used with softmax (multinomial distribution), then this function computes accuracy. It follows this expression:

FM

Where b is the beta parameter of the F-mesaure, dot(h,t) is the dot product between the input/target matrices re-interpreted as two column vectors, and sum(o) and sum(t) is the sum of all the elements in the matrices.

Zero-one loss function

This loss function is defined at the object ann.loss.zero_one:

> loss = ann.loss.zero_one([nil, [, 0.5 ] ])

The constructor could receive an optional first parameter with the expected number of outputs at the ANN component. If given, it will be used as sanity check forcing to be equal to the given input/target sizes. If not given, the size check will be ignored.

It could receive an optional second parameter which by default is 0.5. This second parameter is the threshold which defines when the output is taken as 1. NOTE that if you are using log_logistic outputs, this threshold must be set to math.log(0.5). This parameter is only useful when the model has one output, that is, for two-class problems.

This object is not derivable, so the compute_gradient method is forbidden. The loss function could be use to compute validation error, but not for training. It computes the accuracy of the model classifying to the class with maximum probability.

Clone this wiki locally