30 autodiff

Introduction

Related with package require("aprilann.autodiff").

The autodiff package is an in-development package, which adds automatic differentiation to APRIL-ANN toolkit. It is inspired into Theano, but, in this case, using Lua and APRIL-ANN matrix library instead of tensors in Python.

This packages works over three main data types: constants, scalars, matrices. For example, the addition of two scalars will need the following code:

> AD = autodiff -- for simplicity
> a,b = AD.scalar('a b')
> c = a + b
> = c
(+ a b)

Every expression is stored as a graph of dependencies between operations and its arguments. Every symbol (like a, b or c) is a special Lua table. Symbols has a name, in the case of ground symbols (like a or b), its name is the symbol itself. In the case of operations (like c), the name is what you see when you do print(c).

> = a.name,b.name,c.name
a   b   (+ a b)

A symbol only exists once, so, if you declare another symbol with the same name, or the same operation, it will be a reference to the first symbol declaration. They are exactly the same variable. In this way, the computation is performed in a symbolic graph, and the use of memoization allows to ensures that every operation is only computed once, even if it is repeated several times.

However, depending in the declaration order, the same operation could be represented in different ways. It is possible to optimize the graph applying basic arithmetic properties.

> d = c + b + a
> = d
(+ (+ (+ a b) b) a)
> e = AD.optimize(d)
> = e
(+ (* 2 a) (* 2 b))

It is possible to operate with Lua numbers, which are automatically coerced into autodiff.constant symbol:

> f = 4*b*a/b + 2
> = f
(+ (* (* (* 4 b) a) (^ b -1)) 2)
> = AD.optimize(f)
(+ (* 4 a) 2)

Math operators are overloaded using Lua metamethods, so it is possible to do use math operators in a standard way: addition, subtraction, multiplication, division, power, unary minus. Nevertheless, this operations are defined as functions in autodiff package.

> op = AD.op    -- for simplicity
> c = AD.op.div( AD.op.add(a,b), 4)
> = c
(* (+ a b) 0.25)

Constants are automatically folded, performing constant operations. Subtraction is transformed in addition with multiplication by -1. Division is transformed into multiplication with power by -1. This reduces a lot the simplification effort.

> c = a/b - a
(+ (* (^ b -1) a) (* -1 a))

But this transformations makes difficult to follow the printed string. It is possible to produce a graph in dot format.

> AD.dot_graph(c, "mygraph.dot")

The most interesting thing is the automatic differentiation procedure, which receives a list of variables w.r.t differentiate. The function returns a variable number of arguments, one graph for each differentiated variable.

> c = a * b
> da,db = AD.diff(c, {a,b})
> = da
b
> = db
a

Any graph could be evaluated with a given instantiation of its internal ground variables.

> c = a * b
> = c:eval{ a=5, b=2 }
10
> = da:eval{ a=3, b=2 }
2
> = db:eval{ a=3, b=2 }
3

Additionally, it is possible to compile several graphs together, sharing the computation of equal operations. The compilation procedure automatically applies the optimization function to the given graphs. It receives the list of graphs (symbols) which you want to compile together, and a list of input variables (in order). Optionally it is possible to give a dictionary Lua table with values which will be shared between the compiled function and the caller Lua virtual machine.

> -- AD.func does the compilation
> my_func1 = AD.func({ c, da, db }, { a,b })
> = my_func1(2,5)
10    5     2
> shared = { b=4 }
> my_func2 = AD.func({ c, da, db }, { a }, shared)
> = my_func2(2) -- 2 * 4
8     4     2
> shared.b = 10
> = my_func2(2) -- 2 * 10
20    10    2

Basic functionality

The most important concept is the symbol, which could be a ground variable (with a declared type) or a computational graph. Every symbol has a name which identifies it univocally. Every symbol has overloaded basic math operations (addition, subtraction, multiplication, division, unary minus, power), allowing to use standard math symbols to produce complex operations. Computational graph symbols represent operations. Every symbol implements a generic interface. Depending in the type, symbols are constructed in different ways:

> AD  = autodiff -- for simplicity
> a,b = AD.scalar('a b') -- scalar constructor
> c   = a + 4*b          -- 4 is casted to AD.constant type
> m   = AD.matrix('m')   -- matrix constructor
> d   = m * c

The type of computational graph symbols (operations) depends in the arguments of the operation. Inference rules are declared in order to decide when an operation is over matrices, scalar, constants, or any of the possible symbol types.

eval

value = s:eval( table [, cache] )

Evaluates the caller symbol by using the given table of ground variables. All the ground variables need to be defined (except constants).

> = d:eval{ m=matrix(3,3):linear(), a=10, b=2 }
 0           18          36        
 54          72          90        
 108         126         144       
# Matrix of size [3,3] in row_major [0x2b6cd60 data= 0x293e4a0]

An optional table could be given as second argument, which will be used as cache table (for memoization), allowing to share computations between different symbols evaluation. In this case, it is not necessary to give all the ground variables, as far as all the needed partial operations where computed and moemoized in the given cache table.

> cache = {}
> = c:eval({ a=10, b=2 }, cache) -- computes c=a + 4*b
18
> = d:eval({ m=matrix(3,3):linear() }, cache) -- computes d = m * 18
 0           18          36        
 54          72          90        
 108         126         144       
# Matrix of size [3,3] in row_major [0x2b6cd60 data= 0x293e4a0]

func

function = autodiff.func(symbols, args, shared [, optimize=true])

This function compiles together the given set of symbols, producing as result a Lua function which receives as arguments the ground variables indicated in the args table (ordered), and shares the given table of shared ground variables. Additionally, an optional fourth argument could be given, indicating with a boolean if optimization needs to be performed or not. If not given, this fourth argument will be considered true.

The symbols list (first argument) could be a table of symbols or a symbol. The returned Lua function will receive as many arguments as the size of args table array, and will produce as many values as size of symbols list.

> f = AD.func(d, {a,b}, {m = matrix(3,3):linear()})
> = f(10,2)
 0           18          36        
 54          72          90        
 108         126         144       
# Matrix of size [3,3] in row_major [0x1cddcd0 data= 0x1c86b20]

diff

ds1,ds2,... = autodiff.diff(symbol, { s1, s2, ... } [, seed] )

This function differentiates the given symbol w.r.t. all the given array of ground variables (second argument). It returns multiple outputs, as many as the size of the ground variables array.

> dm,da = autodiff.diff(d, { m, a })
> = dm
(.* (+ (* 4 b) a) (fill (.* m (+ (* 4 b) a)) 1))
> = da
(.* m (fill (.* m (+ (* 4 b) a)) 1))
> = d
(.* m (+ (* 4 b) a))

Looking to the produced computations, they have one thing in common: both uses a fill operation which receives as input the original d computational graph and the number 1. The fill operations produces a symbol with the same shape and type of the given first argument, but replacing all its components with the given second argument, 1 in this case. This is the seed needed to implement reverse accumulation algorithm for automatic differentiation. This seed forces to compute the output produced by the original computation. It is possible to avoid this computation using the optional third argument of the diff function, given a seed symbol which will be treated as a new ground variable. When compiling or evaluation the differentiated symbols, you will need to give the value of the seed. The shape and type of the seed must be exactly the same as the non-differentiated symbol (in this case d), and normally filled with 1s, but it could be a seed produced as derivative by other function.

> seed  = AD.matrix('seed')
> dm,da = AD.diff(d, { m, a })
> = dm
(.* (+ (* 4 b) a) seed)
> = da
(.* m seed)

Symbol advanced methods

diff

table = s:diff(seed, table)

compile

s:compile(compiler)

arg_ipairs

Lua iterator = s:arg_ipairs()

This method returns a Lua iterator which traverses using ipairs all the arguments received by the given symbol. If it is a ground variable, this method returns an empty iterator.

replace

symbol = s:replace(new_symbol)

This method is used in optimization procedures. It allows to replace the caller symbol by the given symbol. It returns the caller symbol.

clear_var_name

symbol = s:clear_var_name()

This method removes the variable name associated with compilation. It returns the caller symbol.

set_dims

symbol = s:set_dims(d1, d2, ...)

This method indicates the shape of the caller symbol. It is useful with matrix type, allowing to check if matrices fit the declared operations. It returns the caller symbol.

> a,b = AD.matrix('a b')
> a:set_dims(2,4)
> b:set_dims(4,3)
> c = a * b
> = table.unpack( c.dims )
2    3
> a:set_dims(2,2)
> c = a * b
[string "luaopen_aprilann_autodiff"]:5: Incorrect matrix dims for multiplication: 2x2 * 4x3
stack traceback:
    [C]: in function 'assert'
    [string "luaopen_aprilann_autodiff"]:5: in function <[string "luaopen_aprilann_autodiff"]:5>
    (...tail calls...)
    stdin:1: in main chunk
    [C]: in ?

set_broadcast

symbol = s:set_broadcast(b1, b2, ...)

The term broadcasting describes how matrices with different shapes are treated during operations. Some mathematical operators allow to work with matrices with not fitted shapes, replicating one of the matrices over broadcasted dimensions. This concept is similar to numpy broadcast. As example, broadcasting is used in ANNs when you add bias vector, allowing to compute operations with several samples at the same time (mini-batch or bunch-mode):

> x,w,b = AD.matrix('x w b')
> x:set_dims(20, 4)  -- 20 features, 4 samples
> w:set_dims(10, 20) -- 10 outputs, 20 inputs
> b:set_dims(10, 1)  -- 10 bias components
> b:set_broadcast(false, true)
> output = w*x + b   -- bias is broadcasted to fit the addition operation
> = output:eval{ x=matrix(20,4):uniformf(),
                 w=matrix(10,20):uniformf(),
                 b=matrix(10,1):uniformf() }
 4.92572     6.4316      5.24994     5.73787   
 5.13588     5.06926     4.93599     5.89694   
 5.82963     6.1323      6.2065      7.06305   
 4.51586     5.76797     5.27805     6.3306    
 4.17794     5.31377     4.05488     4.4576    
 4.20612     5.04744     4.77355     5.41917   
 5.74113     6.56931     5.7724      6.52165   
 6.00033     6.34304     5.77687     7.40072   
 7.1957      8.11042     7.12002     8.2262    
 5.53413     6.22189     5.72926     6.46156   
# Matrix of size [10,4] in row_major [0x29c1750 data= 0x2979e00]

If you don't indicate true in the corresponding broadcasting dimension, an error will occur. By default, all the dimensions has a false broadcasting property.

> b:set_broadcast(false, false)
> output = w*x + b   -- bias is broadcasted to fit the addition operation
> = output:eval{ x=matrix(20,4):uniformf(),
                 w=matrix(10,20):uniformf(),
                 b=matrix(10,1):uniformf() }
[string "luaopen_aprilann_autodiff"]:5: Not broadcasted dimensions must be equal, found broadcasted_dims[2]=4 and dims[2]=1
stack traceback:
	[C]: in function 'assert'
	[string "luaopen_aprilann_autodiff"]:5: in function 'eval_func'
	[string "luaopen_aprilann_autodiff"]:1: in function <[string "luaopen_aprilann_autodiff"]:1>
	(...tail calls...)
	[C]: in ?

Symbols

Constants

Scalars

Matrices

Other data types

Gradient descent learning

Intro
matrix
tokens
dataset
ann 21. ann.loss 22. ann.optimizer 23. ann.graph 25. ann.autoencoders
trainable
random
autodiff 31. autodiff.ann
matlab
stats 51. stats.MI
complex
util
gzio
Image
ImageIO
AffineTransform2D
class
clustering
knn
hyperopt
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

30 autodiff

Introduction

Basic functionality

eval

func

diff

Symbol advanced methods

diff

compile

arg_ipairs

replace

clear_var_name

set_dims

set_broadcast

Symbols

Constants

Scalars

Matrices

Other data types

Gradient descent learning

Clone this wiki locally