forked from stan-dev/stan
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTO-DO.txt
1331 lines (1041 loc) · 44.3 KB
/
TO-DO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
STAN TO-DO LIST
======================================================================
FOR NEXT RELEASE (1.3.0++)
-------------------------------------------
- (Bob/Marcus/Daniel) look at add(), etc. operations for instantiation
-- already did multiply/divide in stan/math
-- now need to worry about agrad / int, etc. to make sure
there's no auto promotion to agrad
-- look at add/subtract where there's broadcast function
-- need tests for int in all the relevant positions
- (Daniel/Michael) instrument tests for new algorithms to isolate
floating point divergence
- (everyone) remove std::cout from API
- (Daniel) fix bin/print spacing
- sort out low ESS in new diagonal adaptation
- sort out rank, etc. from Marco
- go through include-what-you-use output from Ben to
mailing list (early May)
- sort out why log_prob_poly<>() isn't working for Metropolis with double
- redundant model_header.hpp includes --- why not just include matrix.hpp?
- int vs. size_t and error handling in indexing; with size_t,
get cast of negative int to large positive size_t, which will
confuse users in error msgs
- Figure out how to return error messages safely
- e.g., math/matrix/validate_std_vector_index.hpp
std::stringstream ss;
ss << "require 0 < index <= vector size" << msg;
ss << "; found vector size=" << m.cols()
<< "; index j=" << j;
throw std::domain_error(ss.str());
cf., http://stackoverflow.com/questions/10644910/does-stdruntime-error-copy-the-string-passed-in-the-constructor
- (Daniel) fix order of include issues for model header, etc.
in new refactor (see Jiqiang's e-mail 4/19/13, 9:20 PM to stan-dev,
and rest of that thread)
- Get in more contributed code, such as R regression syntax
compiler and DIC and ...
- fix error message in Stan when an error is encountered
during initialization --- currently says "about to be
Metropolis rejected"; at very least, change the message
to qualify when rejection happens
- fix remaining template functions in stan/math to ensure
they match int better than var matches int
- remove warning message on transform OR declare a set of
function names to ignore
The to-do list is organized into the following sections:
* next release to-do and change list
* C++ API * RStan
* Modeling Language * Manual
* Build * Web Pages
* Testing * Release Mgmt
* Command-Line * Models
* C++ API
PRIORITIES?
======================================================================
+ (name): actively assigned
* (name): likely to be assigned to
Contributed Code
-------------------------------------------
+ R compiler
+ emacs modes, etc.
Efficiency/Scalability of Code and Sampling
-------------------------------------------
+ (Daniel) replace tests with doubles
+ (Daniel) full multivariate probability function tests
+ (Daniel) vectorized derivatives for prob functions
+ (Marcus, Bob, Daniel) vectorization of multivariate prob functions
+ (Michael) replace std::vector with Eigen constructs in models
* special function vectorization
* multi-threading
* (Peter, Bob) ensemble samplers: DREAM, differential evolution,
stretch/walk
* (Peter, Bob, Michael) higher-order auto-diff and RM-HMC
Modeling Language
-----------------
+ (Bob) new types: lower triangular (+/- strict), diagonal matrix,
symmetric matrix, Cholesky factor of pos-def matrix
+ (Bob) discrete sampling
+ (Michael B/Daniel) complementary CDF and log CDF, log CCDF,
plus gamma distribution CDF
* ragged arrays
* sparse vectors, matrices, arrays
* matrix literal-like expressions a la MATLAB
* subroutines in modeling language
* initialization block in model w. randomization
* (Bob) user-specified language extensions (plug in facility to declare,
compile, link)
* implicit functions with derivatives (for diff eqs?)
* user-callable transforms with Jacobian lp__ adjustment
* unconstrained parameterizations of prob functions
* (Bob) syntactic improvements in language (e.g., multiple declares,
compound declare and assign)
Command and Interfaces
----------------------
* Python, MATLAB, Julia, Stata, ??? interfaces
* (Michael, Daniel) convergence diagnostics
* I/O from CSV files
* (contributed OK?) compiler for R's linear model notation
Cleanup
-----------
* improve special function and prob function behavior at limits
* (Daniel) simplify error handling by standardizing default policy
- remove throw from all files except for
src/stan/math/error_handling/raise_domain_error.hpp
- Need unit-tests for error handling:
+ check_nonnegative
+ check_positive
+ matrix/check_corr_matrix
+ matrix/check_ordered
+ matrix/check_pos_definite
- Move src/stan/math/matrix/check_range.hpp to
src/stan/math/error_handling/matrix/
- Change error handling in:
+ prob/distributions/multivariate/continuous/wishart.hpp
Doc
---
* intro applied Bayes book with RStan
* (Bob) C++ manual for API and auto-dif
* C++ code cleanup (API doc, privates, consts, doc, split files,
long long)
* more example models, complete/improve BUGS models,
vectorize BUGS models.
* fancier web site graphics
C++ API
======================================================================
[items go here first, then to RStan/cmd]
* fix log1m bug --- see Sebastian Weber's e-mail 5/10/13 4:55 AM
: isolated issue to ctor(var) throwing exception; so
fix remaining places where vari ctors may throw, like lgamma
/functions/inverse_softmax.hpp:#include <boost/throw_exception.hpp>
/functions/inverse_softmax.hpp: * @throw std::invalid_argument if size of the input and output vectors differ.
/functions/lgamma.hpp: // throws domain_error if x is at pole
/functions/log1p.hpp:#include <boost/throw_exception.hpp>
/functions/log1p.hpp: throw std::runtime_error("foo");
/functions/softmax.hpp:#include <boost/throw_exception.hpp>
/functions/softmax.hpp: * @throw std::invalid_argument if size of the input and output vectors differ.
* log_minus_exp(x,y)
= log(exp(x) * (exp(x)/exp(x) - exp(y)/exp(x)))
= log(exp(x)) + log(1 - exp(y)/exp(x))
= x + log(1 - exp(y - x))
= x + log1m(exp(y-x))
or, following http://snipplr.com/view/12707/,
double diff = x - y;
diff /= 2.0;
return (x + y) / 2.0 + log(exp(diff) - exp(-diff));
definitely check x >> y and return x (do this in log_sum_exp, too?)
see: http://cran.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf
for log(1-exp(x))
* Chebyshev polynomials of the first kind
-- with some kind of regression application example
-- Boost has very parameterized Chebyshev polynomials, so figure
out which ones in which parameterizations we need
-- lm(a ~ poly(x,n)...)
-- see 9 May 2013 post to stan-dev from dlakelan
* matrix normal distribution (Marcus suggested)
http://en.wikipedia.org/wiki/Matrix_normal_distribution
-- useful for GeoBUGS type models
* clean up assign_to_var location & namespaces for auto-diff
-- work through use case from C++ API
-- intro namespaces stan::diff::fwd & stan::diff:rev
-- base instances in stan::math
-- fwd-mode instances in stan::diff::fwd
-- rev in diff::rev
* plain old expression print (as input, not as generated in C++)
-- clearer error messages for users
* to_vector(matrix):vector function in row-major order
* kronecker product function
* Add model methods:
x string model_name()
x void constrained_param_names(vector<string>& v)
? void constrained_values(vector<double>& v, const vector<double>& r, const vector<int>& i)
- void unconstrained_param_names(vector<string>& v)
* add "multi_logit_log(int|vec,vec)" with derivs so that
y ~ multi_logit(beta,x);
behaves as:
y ~ categorical(softmax(dot_product(beta,x)));
c.f. MCMCmln in MCMCpack for speed comparison using their
data set
* expose all the groovy special functions Michael/Daniel added for CDFs
* allow only some inits to be specified in a data file (var_context)
and let the rest be randomly generated
* bound error in inits (and discuss in manual)
init file:
x <- structure(c(1,1,1), .Dim=c(3))
model:
parameters {
vector<lower=0,upper=1e50>[3] x;
}
model {
print("x = ", x);
x ~ normal(0,1);
}
problem is overflow with initial value of x being 1e35
due to huge bounds.
* fix Eigen (3.0) indexes to use size_t to match vector.
http://eigen.tuxfamily.org/dox-devel/classEigen_1_1DenseBase.html#a4d4873e91be950c079f067fa97fd5c40
which refers to
http://eigen.tuxfamily.org/dox-devel/TopicPreprocessorDirectives.html
which says:
EIGEN_DEFAULT_DENSE_INDEX_TYPE - the type for column and row indices in matrices, vectors and array (DenseBase::Index). Set to std::ptrdiff_t by default.
* merge forward-mode and higher-order derivative functionals
* agrad for gamma_p function to support gamma_cdf
* (from Frederic Bois): put simulated annealing or tempering in stan...
* broadcast ./ operation for real / vector and vector / real.
* Refactor math library:
- src/stan/math/functions/ibeta.hpp
all doubles for now. If I template it, conflicts with var definition.
- Needs tests. Currently not tested: src/stan/math/matrix/...
+ stan_print.hpp
+ singular_values.hpp
+ cholesky_decompose.hpp
+ eigenvalues_sym.hpp
+ inverse.hpp
+ diag_matrix.hpp
+ transpose.hpp
+ diagonal.hpp
+ block.hpp
+ diag_post/pre_multiply.hpp
+ dot_product.hpp
+ columns_dot_product.hpp
+ log_determiant.hpp
* new vector views
?? promote up from double to matrix?
?? allow matrix | (vector & row_vector & real[] & int[]) | real | int
template <typename StorageT, typename ReturnT>
struct View {
View({const} StorageT& values,
const vector<int>& return_dims); // clunky?
int size() const;
{const} ReturnT& operator[](int n) {const};
const vector<int>& storage_dims() const;
const vector<int>& return_dims() const;
};
? Can ReturnT be an array? not sure we need it, because
everything is vector/matrix in multi-dim prob functions
types:
normal_log(scalar,scalar,scalar)
dirichlet_log(simplex,vector);
multi_normal(vector,vector,cov_matrix);
categorical(int,simplex);
multinomial(vector,simplex);
multinomial(T_out& y, T_prob& theta) {
View<T_out, vector_d> y_view;
View<T_prob, vector_d> theta_view;
check_consistency(y,theta); // lots of work here
for size of y view
check_validity(y); // non-negative ints
for size of theta view
check_validity(theta); // simplex
? promote vectors to matrices?
-- row vetor copy rows #cols times
-- col vector copy cols #rows times
? any way to decide from the outside?
multi_normal_log(double,double,matrix_d);
-- figure out dims of double from matrix
dirichlet_log(double,double)
-- can't figure out dims of double here
// promote double to matrix
double x = ...;
vector<int> dims(2);
dims[0] = M;
dims[1] = N;
View<double, Matrix<double,Dynamic,Dynamic> >(x,dims);
* deprecate lkj_cov
- work out general framework for deprecated functions
with warning messages
- include option for deleted functions with error messages
* How to deal with generated quantities
- execute & print every time
- warning message for print that it only happens
for saved samples
* JSON I/O. Check out:
http://www.boost.org/doc/libs/1_53_0/doc/html/property_tree/reference.html#header.boost.property_tree.json_parser_hpp
* refactoring:
-- remove mcmc/hmc.hpp
-- merge nuts_nondiag and nuts_givenmass
-- take in something other than string on input for nutsmass
-- merge in nuts_diag if possible
* replace Eigen .sum() with stan::math::sum() for auto-dif
* remove (unsigned) long long instances in dump and error testing
* shift APIs from std::vector<T> to Eigen::Matrix<T,Dynamic,1>, etc.
* remove chainable() intermediate class and get rid of virtual
function for init zero
* check out varying target acceptance rates, trying
1/2, 1/4, 3/4, 1/8, 7/8, ...
and then divide and conquer
* evaluate different maximum number of steps
-- during warmup
-- during sampling
* rewrite functions in matrix.hpp.
-- Functions:
-- elt_multiply() [fix inv_wishart_log, wishart_log]
-- subtract() [fix multi_normal_log]
-- should be able to write one templated function to allow for required
uses.
-- needs tests.
* consider alternative transforms
-- ordered as logit(cumulative_sum(simplex))
[need to work through the implied geometry here]
* add lower/upper bound infinite or negative infinite values
-- need is_pos_inf<T>(x) and is_neg_inf(x) tests and we
only have isinf() which is either/or
* make sure all range errors are being caught
-- math/matrix.hpp line 164
-- STAN_NO_DEBUG property
-- turn off checking if (NDEBUG || STAN_NO_DEBUG)
-- impl from Marcus (Eigen):
#ifndef STAN_NO_DEBUG
#ifdef NDEBUG
#define STAN_NO_DEBUG
#endif
#endif
#ifdef STAN_NO_DEBUG
..
#endif
* make sure symmetry tests aren't too stringent for cov_matrix
* unpack Eigen/Boost and use just what we need a la RStan
-- For Boost, do:
mkdir /tmp/boost_1.52.0
find ${STAN_HOME}/src/ -name \*\.\[ch]pp -exec bcp --scan --boost=${STAN_HOME}/lib/boost_1.52.0 '{}' /tmp/boost_1.52.0/ \; &> /tmp/boost_1.52.0/bcp.log
* change step_size = epsilon +/- epsilon_pm to
step_size = min(C, epsilon + normal(0,sigma))
* check that positive definiteness enforced for transformed params,
transformed data
* switch error checking for agrad::var matrix ops to
convert to double-based matrix for check as in vectorized univariates
* Ben's example:
matrix[3,2] ~ normal(matrix[3,1], matrix[1,2])
this could recycle the first arg, a matrix shaped like a vector
and the second arg, a matrix shaped like a roo vector.
* higher-order auto-dif
-- chain rule:
http://en.wikipedia.org/wiki/Chain_rule#Higher_derivatives
-- store adj_, adj2_ and adj3_ in new class
++ stan::agrad::var3 with matching stan::agrad::vari3,
and also var2, vari2
-- chain() needs to compute them all in order
++ perhaps with chain1(), chain2(), and chain3(), with
chain1() being the existing rule
* forward mode auto-dif
- last revision 548:
- src/stan/agrad/ad.hpp
- use for reasonably efficient Hessian calcs
* use expression template for value of += so that we could
-- sum_expression& operator+=(sum_expression&, const var&);
-- sum_expression(const var&);
-- var(const sum_expression&);
: create vectorized vari from sum_expression
would require running lp__, etc. to be sum_expression to be useful
without picking up signature
-- var& operator+=(var&, const var&);
* templated functions returning var should go away in matrix
* add base class for stanc-generated models
* alternative distribution parameterizations
- e.g., precision for (multi) normal
- inverse_sqrt_gamma(...)
* handle covariance or correlation matrix of an AR1 process.
- If rho is the auto-regressive parameter, the entries of the
correlation matrix are powers of rho. And the precision matrix
is tridiagonal and known analytically. So,
it would be inefficient to plow that through
multivariate_normal_log().
* beta distro with mean plus "precision" params
- beta_mean_log(theta|phi,kappa)
= beta_log(theta|phi*kappa, (1-phi)*kappa)
- Jacobian from:
alpha <- phi * k
beta <- (1 - phi) * k
determinant = k (det [[a,b],[c,d]] = ad - bc)
* von Mises distribution (with location and concentration params?)
-- requires modified/hyperbolic Bessel function of first kind of order 0 and 1
* Bessel functions
- J: Bessel function of first kind
derivatives: http://functions.wolfram.com/Bessel-TypeFunctions/BesselJ/20/01/
signature: T bessel(const T1& v, const T2& x);
boost: http://www.boost.org/doc/libs/1_51_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/bessel/bessel.html
- I: Modified (hyperbolic) Bessel function of first kind
derivatives: http://functions.wolfram.com/Bessel-TypeFunctions/BesselI/20/01/
signature: T modified_bessel(const T1& v, const T& x);
boost: http://www.boost.org/doc/libs/1_51_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/bessel/mbessel.html
- we really (or at least also) want bessel_log() and modified_bessel_log()
- and derivatives are really nasty for the concentration param if we
don't auto-dif
* pow special cases
- inv(x) = pow(x,-1) = 1/x
- inv_square(x) = pow(x,-2)
- inv_sqrt(x) = pow(x,-1/2)
* signum function
int signum(real);
* signed sqrt
signed_sqrt(x) = -sqrt(-x) if x < 0
= sqrt(x) if x > 0
* More matrix functions from Eigen:
- singular vectors in addition to values
+ nice to have (U,S,V) = svd(A) syntax for this
- norm()
- squaredNorm()
- lpNorm<1>
- lpNorm<Eigen::Infinity>
* z-score function
- normalize sample to zero sample mean, unit sample variance
* cumulative distribution functions
- boundary conditions at limits of support for cdfs
- should we have the basic cdfs on the log scale?
- need for truncated distros mostly
foo_cdf_lb_log(L|...) == log(1 - foo_cdf(L|...))
== foo_ccdf_log(L|...)
foo_cdf_ub_log(U|...) == log(foo_cdf(U|...))
== foo_cdf_log(U|...)
foo_cdf_lub_log(L,U|...) == log(foo_cdf(U|...) - foo_cdf(L|...))
== log_sum_exp(foo_cdf_log(U|...),
0, foo_ccdf_log(L|...))
- to add:
+ calculation of cdf
+ two functions: 1) with templated policy, 2) with no policy,
calling templated function with stan::math::default_policy()
+ TODO: test derivatives?
+ dox: add doxygen.
+ add to Stan Modeling Language Reference Manual function list
* matrix special products
-- quadratic_multiply(u,v) = u' * v * u
++ for square matrices,
d/dx det(x' * A * x) = 2 det(x' * A * x) inv(x)'
d/dx det(A) = det(A) inv(A)' = det(A) / A'
* sorting and interpolation
-- rank(v,s): number of components of v less than v[s]
-- ranked(v,s): s-th smallest component of v
-- interp_ln(e,v1,v2) from BUGS (whatever for?)
* add general mixture model
-- modeling language side
simplex(K) lambda;
vector(K) mu;
vector(K) sigma;
y ~ mixture(lambda, // mixing
(mu,sigma), // param vecs
"normal"); // distro
-- C++ impl
std::vector<double> log_ps;
for (k in 1:K)
log_ps.push_back(log lambda[k] + normal_log(y,mu[k],sigma[k]));
return log_sum_exp(log_ps);
* use our arena-based alloc as STL allocator for auto-dif
- already doing this for dot products
- http://www.cplusplus.com/reference/std/memory/allocator/
- what do we need to do for dtor to work?
-- need to take care of dynamic updates
* convert memory for vari to singleton
-- compile time config with flag for thread-safe / not
* make sure std::less<T*> and std::swap<T*>
does right thing for our types (like var and vari), as
these are used in "algorithms".
* figure out way to have vari elements that are NOT on the stack
- useful for numeric_limits<var>
- stack_vari ==> vari ==> chainable
local_vari ==> vari
(stack_vari has operator new; local_vari uses dtor?)
* allow Stan to read data in multiple formats (particulary CSV)
-- and allow multiple files to be specified for single run
* memcpy faster than copy ctor and operator= for var?
* remove dependence of prob functions like normal_log on agrad,
Eigen, etc. [may not be easy or worth the effort]
* unfold logistic and linear regression auto-difs
* allow user-defined extensions
* dump format parser to handle:
- integers with scientific notation
- NaN, Inf, -Inf (written just like that)
- length one arrays read in like scalars
* make sure still possible to use header-only ad
* add scalar subtraction to match scalar addition of vectors
(to distribute over values)
* Stan safe mode to compile BUGS/JAGS-compliant models
- allows more error checking and optimization
* add STAN_NO_DEBUG option to turn off bounds checking
* add checks for matrix sizes in matrix ops
* Speed up compilation
-- More precompilation into object files.
Pros: faster compiles
Cons: it makes the code more difficult to manage
and less flexible by requiring redundancy
and preinstation of templates
-- More targeted imports.
Pros: faster compiles because less code to read
Cons: defeats precompiled headers
* some kind of multi-threaded timer to monitor Stan
* discrete samplers
-- for special block, use main log_prob to evaluate
conditional probs up to proportion
-- Gibbs for small outcome sets
-- Gibbs for non-ordinal, non-count params
++ e.g., LDA topic samples
-- Slice for larger ones
++ adapt?
-- what about mulitnomial with constraints?
++ metropolis?
* more general built-in initializations
- normal or student-t w. params
- or even set uniform boundaries
- e.g., uniform(-1,1), normal(0,1), t(0,2.5)
* cross-chain sharing of adaptation parameters, which should
be similar across chains if all is going well
* set mass matrix for sampling parameters [& percolate to RStan/cmd]
-- need diagonal to be efficient
-- could allow general matrix
* ongoing code cleanup
- const declarations wherever possible
- more care with size_t, etc.
- privatize everything possible
- remove unnecessary dependencies/includes
- convert for loops with size bounds to appropriate iterators
- templatize std::vector/Eigen functions where possible
- reduce code dup in gm/generator
* revise API doc for doxygen
* error handling for special functions
- math and agrad
- matrix sizes
- how to configure std::log(), etc.?
* Watanabe's WAIC: Widely applicable information criterion
* implicit functions with derivatives (iterative solver)
* refactor finite diff derivative calcs to functor
template <class F, typename S>
bool finite_diffs(const vector<S>& x,
const F& f,
vector<S>& df_dx);
* Speed test pass-by-const-ref and pass-by-val for stan::agrad::var
* break out body of error handling to get isnan if/else inlined
* "non-parametric" models
- approximate a la BUGS vol 3
- merge reversible jump and HMC somehow?
* allow references in Stan language to make it easy to
set eigenvalues + eigenvectors (or singular vals + vecs)
in one call
* optimization API
- functor-based concept design with templates
- easily pluggable with log prob in model for MAP
* add adaptive slice sampling
-- will it fit into adaptive_sampler in mcmc?
-- goal to minimize number of function evals
++ step out: linear, with log step-in
++ doubling: exp, log step-in (but reject & more evals)
* adapt low-order approximation to an actual mass matrix
* add DE and/or DREAM extensions as sampler
-- if I (Bob) understand Matt correctly, we can just
concatenate particle samples to calculate ESS
* online R-hat (split R-hat?) monitoring + means, etc.
-- need monitor interface like JAGS?
* elim stan/maths/matrix lib by adding implicit ctor
for Matrix<var,...> based on Matrix<double,...>
* blocking updates on parameters
- only update subset of parameters
- specify blocks in program?
- allow Gibbs on some variables
- split HMC into blocks (aka batches)
- in the limit do conditional with HMC!
* stepwise debugger for Stan
- print adaptation params
- print gradients
- just a mode on command to print more out per sample?
- ideally link runtime errors back to lines of Stan code
Models
======================================================================
* debug BUGS vol 1 kidney model --- sometimes hangs
* figure out what's going on with TS's model that's causing
Eigen to throw an assert failure
* Implementing lasso-like glm and hierarchical glm in stan.
Modeling Language
======================================================================
* allow array modifiers on types as well as variables
-- int[4] x;
-- int x[4];
-- change all the syntax or allow both?
* add ragged array types
-- data { int x[,]; }
for a possibly ragged array x
-- easiest for data
++ buffer values until get postfix dims
-- would also work for transformed data because sets
-- would need a way to declare raggedness for params
in order to read them in from array
-- parameters { int x[ns]; }
might be used if ns is an int array of sizes
-- how to generalize this?
-- int<lower=0> N; // number of subjects
int<lower=1> n_obs[N]; // num observations for subject n in 1:N
cov_matrix[n_obs] c; // cov matrices for subject n in 1:N
* The intent of the notation is that c[n] for n in 1:N
is an (n_obs[n] x n_obs[n]) covariance matrix.
* This notation lets us do a single final layer of
raggedness and could apply anywhere there's a size. So
real y[n_obs];
is a 2D ragged array structure where y[n] is a n_obs[n]-vector
* get line numbers into error messages
* compiler for R's linear model notation
* generalize vector ops to conforming matrices
-- need to work out whether we can do this and
still preserve type inference; e.g., x * x'
may still be a matrix even if x is a row vector.
* integer subtypes
- non_negative_int = int[1,]
- natural = int[0,]
- boolean = int[0,1]
- categorical(K) = int[1,K]
* real subtypes
- probability = real[0,1]
- correlation = real[-1,1]
- non_negative_real = real[0,inf]
- non_positive_real = real[-inf,0]
* matrix slice and assign
- index m
- range-indexes: implicit ellision == :, a:, :b, a:b
- e.g., matrix[M,N] m; vector(M) v; m[,1] = v;
* conditional expressions
- binary operators on int or double
+ bool op(int,int);
+ bool op(double,double); // allows promotion
+ usual comparisons: ==, !=, <=, <, >=, >
- operators on conditions
(!cond), (cond && cond), (cond || cond), (cond)
- ternary op: cond ? x : y
* conditional statements
- if <cond> statement else if <cond> statement ... else statement;
- while <cond> statement;
- repeat <statement> until <cond>;
- for (<statement>; <cond>; <statement>) statement;
- Marcus's hack:
if (a > b) {...}
==
for (i in 1:int_step(a-b)) {...}
* for-each statement
- foreach (<var> : <list-expression>) <statement>;
* multi-returns and array literals
- (a,b,c) = foo(); // function returns array
- (m,n,p) = (1,2,3); // (1,2,3) is array literal
* matrix and array notation:
- The syntax is pretty simple modulo size conformance
checking. Here's the BNF:
matrix_expression ::= '[' matrix_row (';' matrix_row)* ']'
matrix_row ::= expression*
The idea is that the expressions are laid out in a row
and a ';' indicates to start a new row.
The basic cases are as follows:
[1 2 3] : row_vector[3]
[1 2 3; 4 5 6] : matrix[2,3]
Use transpose to create vectors:
[1 2 3]' : vector[3]
And allow more complex things in a matrix_row in the BNF,
such as a sequence of vectors:
[[1 2 3]' [4 5 6]'] : matrix[3,2]
In general, we could allow matrices, too:
[[1 2; 3 4] [5 6; 7 8]] : matrix[4,2]
This would keep nesting in the obvious ways.
- We need some way to call out arrays and allow higher
dimensions, so we can't overload the same syntax.
I like the following:
array ::= 'a' '(' expression* ')'
Examples are:
a(1,2,3) : int[3]
a(1.0,2,3) : real[3]
a(a(1,2,3), a(4,5,6)) : int[2,3]
a([1 2 3], [4 5 6]) : row_vector[3][2]
But it brings up type inference issues, because
a(1,2,3) is int[3], not real[3]. We could
add primitive 1D-array syntax such as:
| 'ints' '(' int_expression* ')'
| 'reals' '(' real_expression* ')'
* multiple declarations
- e.g., double a, b, c[N];
* declare and assign
- e.g., double x <- 5.0;
double b[5];
double a[5] <- b; // odd, eh?
simplex(5)[2] c <- ... // wouldn't be obvious to support
* tuple types for multiple returns (U,S,V) <- svd(A)
-- different than array literals as may have different typed
components
* introduce transforms into the Stan language that have the same
effect as the declarations, e.g.
parameters { real x; }
transformed_parameters { real(0,1) y; y <- lub(y,0,1,lp__); }
* extensions
- declare file paths to #include
- declare namespace usings
- declare function sigs for parser
++ if happens up front, in time for body parsing
* ragged arrays
- ready to use with vec<...<vec<x>...> pattern
- not clear how to declare (follow C?)
-- example on p. 8 of
http://www.jstatsoft.org/v36/c01/paper/
involving ragged param arays for graded response model
* subroutines
- user-defined functions in StanGM
- user-defined densities (how much to require?)
- interpreted conditionally based on block?
* compound op-assigns in StanGM
+<-, *<-, etc. (ugly, ugly, ugly syntax; cf., +=, *=, etc.)
* warnings
- a <- foo(theta) if LHS contains variable: overwrite var
- a ~ foo(theta) if LHS is complex expression: need Jacobian
- check each parameter shows up on LHS of ~ at least once
* power operations
- matrix power: A^n
- matrix elementwise power: A.^x
- regular old real power: x^y
* allow normal_log(y|0,1) as synonym for normal_log(y,0,1)
* Allow normal_log<true> and normal_log<false> to be accessed
directly
* add a parameter initialization block
- fill in values not given randomly or report error?
- needs randomization
* new types
-- lower triangular matrix
++ strict lower triangular matrix
++ covariance matrix cholesky factor matrix
-- diagonal matrix
-- symmetric matrix
[need to treat like others and just test at end]
-- point on hypersphere
++ like simplex, but 2-norm is 1, rather than 1-norm
++ trickier for generating last point as the sign
remains underdetermined
* positive-definiteness-preserving operations
-- rescale strictly positive
* unit matrix function unit_matrix(K)
-- return var or just double?
-- users shouldn't use this to assign to transformed param
* constants for integer max and min values
* random generation
-- generated quantities block
++ use sampling notation y ~ normal(0,1);
to set y to normal_random(0,1) draw
-- transformed data block
++ could also use sampling notation
++ issue of whether to share across chains or not
-- transformed parameter block
++ possible, but can't use sampling notation
-- need to push RNG through to whatever calls necessary
-- if we add to models, etc., should derivatives just
be the same as basic so that
y[n] <- random_normal(mu,sigma);
gets same grad as:
y[n] ~ normal(mu,sigma);
* truncations need to be vectorized, too!
Build
======================================================================
* remove writes into Stan directory itself
* set up make to work from directory other than STAN_HOME
- pass in arg? environment var?
Testing
======================================================================
* agrad distribution tests misuse the autodif stack and may result in
unexpected behavior now. Need to fix.
* build unit tests from the BUGS models that just check
the gradients are right (this'll also cover parsing and
being able to fire up the models)
* auto testing of sampler
- Cook/Gelman/Rubin-style interval tests
- needs effective sample size calc to bound intervals
- needs simulated parameters to test estimators
- push ESS "measurement error" through the interval tests
* more matrix tests (for all ops w. gradients)
* benchmarks
- agrad vs. RAD/Sacado, CppAD, ... ?
- stan vs. MCMCpack, R, BUGS, JAGS, OpenBUGS, ...
* profile running models w. various operations
* complete math robustness and limits review
- want domain boundaries to be right (usually
-inf, 0 and inf, etc.)
* more diagnostics during warmup of where epsilon is going
* speed measurement
- to converge
- time per effective sample once converged
- time to 4 chains at 25 ESS/chain, rHat < 1.1 for all
parameters theta[k] and theta[k]*theta[k]
- discarding first half of samples too many if fast to converge
* Speed vs. JAGS
-- run JAGS compiled with g++ -O3
-- Look at models where Stan performs well
e.g. logistic or linear regression w. Cauchy and Normal prior
- varying model shapes (step up exponentially)
+ number of data items
+ number of parameters
+ correlation of parameters (multi-variate Cauchy or Normal)
+ interaction of params (? need this with correlation ?)
+ conjugate vs. non-conjugate priors
(well and misspecified -- more stats than speed issue)
-- Single threaded
-- Run 4 chains (each of same length) of each system (Stan and JAGS)
- diffuse initializations (hand generate for each)
- until convergence by split R-hat (all params under 1.05)
- total (across all chains) ESS is > 100 (and < 150 or something?)
(e.g., 25/25/25/25, or 15/35/20/30)
- report time in effective samples/second for min ESS parameter
-- Report multiple runs of each (10? 100?)
- histograms/densities or mean/sd of multiple runs
- how much will it vary by seed
-- Throw away first half of each run?
- to start, until split convergence/sample speeds