-
Notifications
You must be signed in to change notification settings - Fork 12
30 autodiff
Related with package require("aprilann.autodiff")
.
The autodiff
package is an in-development package, which adds
automatic differentiation
to APRIL-ANN toolkit. It is inspired into
Theano, but, in this case, using
Lua and APRIL-ANN matrix library instead of tensors in Python.
This packages works over three main data types: constants, scalars, matrices. For example, the addition of two scalars will need the following code:
> AD = autodiff -- for simplicity
> a,b = AD.scalar('a b')
> c = a + b
> = c
(+ a b)
Every expression is stored as a graph of dependencies between operations and its
arguments. Every symbol (like a
, b
or c
) is a special Lua table. Symbols
has a name, in the case of ground symbols (like a
or b
), its name is the
symbol itself. In the case of operations (like c
), the name is what you
see when you do print(c)
.
> = a.name,b.name,c.name
a b (+ a b)
A symbol only exists once, so, if you declare another symbol with the same name, or the same operation, it will be a reference to the first symbol declaration. They are exactly the same variable. In this way, the computation is performed in a symbolic graph, and the use of memoization allows to ensures that every operation is only computed once, even if it is repeated several times.
However, depending in the declaration order, the same operation could be represented in different ways. It is possible to optimize the graph applying basic arithmetic properties.
> d = c + b + a
> = d
(+ (+ (+ a b) b) a)
> e = AD.optimize(d)
> = e
(+ (* 2 a) (* 2 b))
It is possible to operate with Lua numbers, which are automatically coerced into
autodiff.constant
symbol:
> f = 4*b*a/b + 2
> = f
(+ (* (* (* 4 b) a) (^ b -1)) 2)
> = AD.optimize(f)
(+ (* 4 a) 2)
Math operators are overloaded using Lua metamethods, so it is possible to do
use math operators in a standard way: addition, subtraction, multiplication,
division, power, unary minus. Nevertheless, this operations are defined as
functions in autodiff
package.
> op = AD.op -- for simplicity
> c = AD.op.div( AD.op.add(a,b), 4)
> = c
(* (+ a b) 0.25)
Constants are automatically folded, performing constant operations. Subtraction is transformed in addition with multiplication by -1. Division is transformed into multiplication with power by -1. This reduces a lot the simplification effort.
> c = a/b - a
(+ (* (^ b -1) a) (* -1 a))
But this transformations makes difficult to follow the printed string. It is possible to produce a graph in dot format.
> AD.dot_graph(c, "mygraph.dot")
The most interesting thing is the automatic differentiation procedure, which receives a list of variables w.r.t differentiate. The function returns a variable number of arguments, one graph for each differentiated variable.
> c = a * b
> da,db = AD.diff(c, {a,b})
> = da
b
> = db
a
Any graph could be evaluated with a given instantiation of its internal ground variables.
> c = a * b
> = c:eval{ a=5, b=2 }
10
> = da:eval{ a=3, b=2 }
2
> = db:eval{ a=3, b=2 }
3
Additionally, it is possible to compile several graphs together, sharing the computation of equal operations. The compilation procedure automatically applies the optimization function to the given graphs. It receives the list of graphs (symbols) which you want to compile together, and a list of input variables (in order). Optionally it is possible to give a dictionary Lua table with values which will be shared between the compiled function and the caller Lua virtual machine.
> -- AD.func does the compilation
> my_func1 = AD.func({ c, da, db }, { a,b })
> = my_func1(2,5)
10 5 2
> shared = { b=4 }
> my_func2 = AD.func({ c, da, db }, { a }, shared)
> = my_func2(2) -- 2 * 4
8 4 2
> shared.b = 10
> = my_func2(2) -- 2 * 10
20 10 2
The most important concept is the symbol, which could be a ground variable (with a declared type) or a computational graph. Every symbol has a name which identifies it univocally. Every symbol has overloaded basic math operations (addition, subtraction, multiplication, division, unary minus, power), allowing to use standard math symbols to produce complex operations. Computational graph symbols represent operations. Every symbol implements a generic interface. Depending in the type, symbols are constructed in different ways:
> AD = autodiff -- for simplicity
> a,b = AD.scalar('a b') -- scalar constructor
> c = a + 4*b -- 4 is casted to AD.constant type
> m = AD.matrix('m') -- matrix constructor
> d = m * c
The type of computational graph symbols (operations) depends in the arguments of the operation. Inference rules are declared in order to decide when an operation is over matrices, scalar, constants, or any of the possible symbol types.
value = s:eval( table [, cache] )
Evaluates the caller symbol by using the given table of ground variables. All the ground variables need to be defined (except constants).
> = d:eval{ m=matrix(3,3):linear(), a=10, b=2 }
0 18 36
54 72 90
108 126 144
# Matrix of size [3,3] in row_major [0x2b6cd60 data= 0x293e4a0]
An optional table could be given as second argument, which will be used as cache table (for memoization), allowing to share computations between different symbols evaluation. In this case, it is not necessary to give all the ground variables, as far as all the needed partial operations where computed and moemoized in the given cache table.
> cache = {}
> = c:eval({ a=10, b=2 }, cache) -- computes c=a + 4*b
18
> = d:eval({ m=matrix(3,3):linear() }, cache) -- computes d = m * 18
0 18 36
54 72 90
108 126 144
# Matrix of size [3,3] in row_major [0x2b6cd60 data= 0x293e4a0]
function = autodiff.func(symbols, args, shared [, optimize=true])
This function compiles together the given set of symbols, producing as result a
Lua function which receives as arguments the ground variables indicated in the
args
table (ordered), and shares the given table of shared
ground variables.
Additionally, an optional fourth argument could be given, indicating with
a boolean
if optimization needs to be performed or not. If not given, this
fourth argument will be considered true
.
The symbols list (first argument) could be a table of symbols or a symbol. The
returned Lua function will receive as many arguments as the size of args
table
array, and will produce as many values as size of symbols
list.
> f = AD.func(d, {a,b}, {m = matrix(3,3):linear()})
> = f(10,2)
0 18 36
54 72 90
108 126 144
# Matrix of size [3,3] in row_major [0x1cddcd0 data= 0x1c86b20]
ds1,ds2,... = autodiff.diff(symbol, { s1, s2, ... } [, seed] )
This function differentiates the given symbol w.r.t. all the given array of ground variables (second argument). It returns multiple outputs, as many as the size of the ground variables array.
> dm,da = autodiff.diff(d, { m, a })
> = dm
(.* (+ (* 4 b) a) (fill (.* m (+ (* 4 b) a)) 1))
> = da
(.* m (fill (.* m (+ (* 4 b) a)) 1))
> = d
(.* m (+ (* 4 b) a))
Looking to the produced computations, they have one thing in common: both uses a
fill operation which receives as input the original d
computational graph
and the number 1
. The fill operations produces a symbol with the same shape
and type of the given first argument, but replacing all its components with the
given second argument, 1
in this case. This is the seed needed to
implement reverse accumulation algorithm for automatic differentiation. This
seed forces to compute the output produced by the original computation. It is
possible to avoid this computation using the optional third argument of the
diff
function, given a seed symbol which will be treated as a new ground
variable. When compiling or evaluation the differentiated symbols, you will need
to give the value of the seed. The shape and type of the seed must be exactly
the same as the non-differentiated symbol (in this case d
), and normally
filled with 1s, but it could be a seed produced as derivative by other function.
> seed = AD.matrix('seed')
> dm,da = AD.diff(d, { m, a })
> = dm
(.* (+ (* 4 b) a) seed)
> = da
(.* m seed)
table = s:diff(seed, table)
s:compile(compiler)
Lua iterator = s:arg_ipairs()
This method returns a Lua iterator which traverses using ipairs
all the
arguments received by the given symbol. If it is a ground variable, this
method returns an empty iterator.
symbol = s:replace(new_symbol)
This method is used in optimization procedures. It allows to replace the caller symbol by the given symbol. It returns the caller symbol.
symbol = s:clear_var_name()
This method removes the variable name associated with compilation. It returns the caller symbol.
symbol = s:set_dims(d1, d2, ...)
This method indicates the shape of the caller symbol. It is useful with
matrix
type, allowing to check if matrices fit the declared operations.
It returns the caller symbol.
> a,b = AD.matrix('a b')
> a:set_dims(2,4)
> b:set_dims(4,3)
> c = a * b
> = table.unpack( c.dims )
2 3
> a:set_dims(2,2)
> c = a * b
[string "luaopen_aprilann_autodiff"]:5: Incorrect matrix dims for multiplication: 2x2 * 4x3
stack traceback:
[C]: in function 'assert'
[string "luaopen_aprilann_autodiff"]:5: in function <[string "luaopen_aprilann_autodiff"]:5>
(...tail calls...)
stdin:1: in main chunk
[C]: in ?
symbol = s:set_broadcast(b1, b2, ...)
The term broadcasting describes how matrices with different shapes are treated during operations. Some mathematical operators allow to work with matrices with not fitted shapes, replicating one of the matrices over broadcasted dimensions. This concept is similar to numpy broadcast. As example, broadcasting is used in ANNs when you add bias vector, allowing to compute operations with several samples at the same time (mini-batch or bunch-mode):
> x,w,b = AD.matrix('x w b')
> x:set_dims(20, 4) -- 20 features, 4 samples
> w:set_dims(10, 20) -- 10 outputs, 20 inputs
> b:set_dims(10, 1) -- 10 bias components
> b:set_broadcast(false, true)
> output = w*x + b -- bias is broadcasted to fit the addition operation
> = output:eval{ x=matrix(20,4):uniformf(),
w=matrix(10,20):uniformf(),
b=matrix(10,1):uniformf() }
4.92572 6.4316 5.24994 5.73787
5.13588 5.06926 4.93599 5.89694
5.82963 6.1323 6.2065 7.06305
4.51586 5.76797 5.27805 6.3306
4.17794 5.31377 4.05488 4.4576
4.20612 5.04744 4.77355 5.41917
5.74113 6.56931 5.7724 6.52165
6.00033 6.34304 5.77687 7.40072
7.1957 8.11042 7.12002 8.2262
5.53413 6.22189 5.72926 6.46156
# Matrix of size [10,4] in row_major [0x29c1750 data= 0x2979e00]
If you don't indicate true
in the corresponding broadcasting dimension, an
error will occur. By default, all the dimensions has a false
broadcasting
property.
> b:set_broadcast(false, false)
> output = w*x + b -- bias is broadcasted to fit the addition operation
> = output:eval{ x=matrix(20,4):uniformf(),
w=matrix(10,20):uniformf(),
b=matrix(10,1):uniformf() }
[string "luaopen_aprilann_autodiff"]:5: Not broadcasted dimensions must be equal, found broadcasted_dims[2]=4 and dims[2]=1
stack traceback:
[C]: in function 'assert'
[string "luaopen_aprilann_autodiff"]:5: in function 'eval_func'
[string "luaopen_aprilann_autodiff"]:1: in function <[string "luaopen_aprilann_autodiff"]:1>
(...tail calls...)
[C]: in ?