Deprecate Flux.Optimisers and implicit parameters in favour of Optimisers.jl and explicit parameters #1986

CarloLucibello · 2022-06-02T10:49:00Z

So all the things are in place and we can get rid of the current pattern using implicit params:

using Flux
ps = Flux.params(model)
opt = Flux.Optimise.ADAM()
gs = gradient(() -> loss(model(x), y), ps)
Flux.Optimise.update!(opt, ps, grads)

to the one using explicit parameters and Optimisers.jl

using Flux, Optimisers
opt_state = Optimisers.setup(Optimisers.Adam(), model)
∇model = gradient(m -> loss(m(x), y), model)[1]
opt_state, model = Optimisers.update!(opt_state, model, ∇model)
## or the non-mutating
# state, model = Optimisers.update(opt_state, model, ∇model)

Code

Upgrade or remove train!(loss, ::Params, data, ::AbstractOptimiser):
A replacement for iterating over Flux.params(m)... or just fix it, broken in Make params non-differentiable (Closes #2040 & #2048) #2054:

julia> gradient(m -> (sum(norm, Flux.params(m))), (x=[1,2.0], y=[3.0]))
(nothing,)

Documentation

Rewrite https://fluxml.ai/Flux.jl/stable/training/optimisers/ -- done in Re-write training docs #2114
Better documentation for Optimisers.jl
- working with custom models doc improvement: working with custom model types Optimisers.jl#84

Examples

Port model zoo examples -- tag "update"
Help porting downstream libraries and check there are no surprises
- GraphNeuralNetworks.jl

@mcabbott @ToucheSir @darsnack feel free to add to this

The text was updated successfully, but these errors were encountered:

ToucheSir · 2022-06-03T02:32:13Z

Thoughts on copying or moving parts of https://github.com/FluxML/Flux.jl/blob/v0.13.3/docs/src/training/optimisers.md#scheduling-optimisers over to the Optimisers.jl docs?

darsnack · 2022-06-03T03:50:59Z

Xref https://github.com/darsnack/ParameterSchedulers.jl/issues/34

DrChainsaw · 2022-06-11T17:01:23Z

I think that this issue might be a blocker for explicit parameters.

For example:

julia> gradient(bn -> sum(bn(randn(Float32, 3,1))), BatchNorm(3))
((λ = nothing, β = Fill(1.0f0, 3), γ = Float32[0.0, 0.0, 0.0], μ = nothing, σ² = nothing, ϵ = -0.0f0, momentum = nothing, affine = nothing, track_stats = nothing, active = nothing, chs = nothing),)

julia> gradient(bn -> sum(bn(randn(Float32, 3,1))), Chain(BatchNorm(3)))
(nothing,)

Not sure if related, but it seems that for some models the call hangs idefinitely (I killed the process after 8 hours) if gradients are missing from one of the layers due to the above. The only 100% reproducer I have for this so far is one generated by https://github.com/DrChainsaw/NaiveGAflux.jl for a particular random seed, but if I manage to narrow it down I'll post an issue. If anyone has any tips on could cause the stall it would be very helpful.

ToucheSir · 2022-06-12T18:55:41Z

I wonder if this is encountering a similar issue as TuringLang/Turing.jl#1754. Have you tried -O1?

For troubleshooting, a smaller example that doesn't hang but still takes some time to compile (i.e. over a minute) would be good. Bonus if you can provide an inference/total time breakdown from SnoopCompile.@snoopi_deep.

DrChainsaw · 2022-06-13T22:49:23Z

Thanks @ToucheSir. Sorry if this is derailing. I think the root issue with missing gradients is quite important regardless as it impacts vanilla Flux.

Putting the discussion on the hangs in details in an attempt to reduce clutter

From what I can tell there does not seem to be any poor scaling at work. When printing out the forwards and backwards pass for the call gradient(m -> sum(m(x)), model), it seems like the whole backwards pass through the model is completed before it hangs. Using -O1 does not seem to alleviate anything (although I haven't waited for 8 hours this time).

One hypothesis is that it is the evil loopy structure of the CompGraph where vertices has references to both inputs and outputs which somehow causes Zygote to get stuck in some infinite loop looking for the missing gradients. If I strip the graph of the outputs (they are not involved in the forward/backward pass) the gradient call terminates correctly after some 100 or so seconds (although vast majority of time is spent after the backwards pass through the model is completed). Maybe its just a coincidence from making the model a little bit simpler when stipping the outputs though.

ToucheSir · 2022-06-14T00:33:25Z

Thanks. This looks like it deserves its own issue, so I've opened FluxML/Zygote.jl#1243.

mcabbott · 2022-07-27T01:52:05Z

While this isn't a complete solution, one way around #1986 (comment) is to simply make the normalisation layers into immutable structs. The only mutation they need is a marker of whether they are active, and this can be a Ref(true) or something.

Is this worth doing?

DhairyaLGandhi · 2022-07-27T09:45:39Z

Separating out the configuration as in #1509 is also a solution and gives type stability and inferrability as well

DrChainsaw · 2022-07-28T13:18:33Z

Another possibility is to refactor into functions which don't use the mutable struct and then add trivial rrules, something like:

(BN::BatchNorm)(x) = batchnorm(BN..., x) # splatting mostly because I'm too lazy to type out all the members

function ChainRulesCore.rrule(config::RuleConfig{>:HasReverseMode}, BN::BatchNorm, x)
    res, back = rrule_via_ad(config, batchnorm, BN..., x)
    function BatchNorm_back(Δ)
        δs = back(Δ)
        Tangent{BatchNorm}(δs[1:end-1]...), δs[end]
    end
    return res, BatchNorm_back
end

Not sure if this has any advantages over the above proposals, especially considering that the refactor looks like it could be a bit messy. I suppose one far fetched argument is that users might consider it a breaking change to make exported structs immutable.

ToucheSir · 2022-07-30T01:13:15Z

The main sources of type stability on norm layer types are getproperty/setproperty! on mutable structs and control flow in the forward pass. I believe the former includes Refs, so just making the outer type immutable is probably insufficient. What would work is writing rrules that hide said field lookups. #2005 is more or less this for norm layers which don't track stats (note that kwarg default vaules can introduce control flow—Zygote is real finicky). https://github.com/FluxML/Flux.jl/blob/v0.13.4/src/layers/normalise.jl#L245-L255 is the big remaining culprit, but can be worked around in a similar way. https://github.com/FluxML/Flux.jl/blob/v0.13.4/src/layers/normalise.jl#L258 I suspect can be resolved without any rrules.

ToucheSir · 2022-07-30T02:50:00Z

@DrChainsaw saw JuliaLang/julia#44185 today while catching up on JuliaCon talks. Are you willing to try it out on your side and seeing what the profiler spits out?

DrChainsaw · 2022-07-30T10:55:58Z

@ToucheSir wow, that looks like a great QOL improvement in general. I only have access to a windows environment atm, but I will give it a shot when that changes. I'm a bit worried that it seems to require yielding. Ctrl-c does nothing so I fear that it is stuck in an infinite loop which does not yield anywhere.

CarloLucibello · 2023-07-09T16:37:52Z

@mcabbott Can this be closed as done?

mcabbott · 2023-07-10T02:24:33Z

I think the remaining issue is RNNs not working right with the new explicit style. Have not followed closely but e.g. #2258

darsnack · 2023-07-10T02:48:14Z

And regularization e.g. FluxML/Optimisers.jl#57

Qfl3x · 2024-08-21T08:02:27Z

A question: Is using the "old" implicit going to introduce bugs/unexpected behavior during training?

ToucheSir · 2024-08-21T17:04:17Z

Legacy code using that approach will continue to work, but new code should not be using implicit params. Any bugs found with the old approach should be reported as usual, but note that they are effectively deprecated and will be removed in a future major release.

This was referenced Jun 6, 2022

[Discussion]: doctests, docstrings, documentation manual, and unclear internal API (for newcomers) #1990

Closed

Flux.params for primitive types #1991

Open

ToucheSir mentioned this issue Jun 14, 2022

Zygote hangs when taking explicit gradients of NaiveGAFlux model FluxML/Zygote.jl#1243

Closed

mcabbott added this to Explicit Parameter Transition Jul 26, 2022

mcabbott mentioned this issue Oct 13, 2022

Add explicit train!, unify update!, and auto-translate the two Adams #2082

Merged

3 tasks

mcabbott added optimisers-dot-jl deprecation discussion labels Oct 20, 2022

mcabbott mentioned this issue Nov 27, 2022

Re-write training docs #2114

Merged

mcabbott added this to the v0.14 milestone Jun 23, 2023

CarloLucibello modified the milestones: v0.14, v0.15 Jul 10, 2023

mcabbott mentioned this issue Apr 29, 2023

Add total(f, model) to replace implicit sum(f, Flux.params(model)) FluxML/Optimisers.jl#57

Closed

mcabbott mentioned this issue Aug 19, 2023

Implement Lion, up to 5x faster than Adam, and more accurate FluxML/Optimisers.jl#156

Closed

CarloLucibello mentioned this issue Mar 26, 2024

removing old deprecations and Optimise module #2412

Closed

jeremiedb mentioned this issue Apr 11, 2024

set compat to julia >= 1.9 Evovest/NeuroTreeModels.jl#7

Merged

CarloLucibello mentioned this issue Nov 5, 2024

Export Optimisers and remove params and Optimise from tests #2495

Merged

CarloLucibello closed this as completed in #2495 Nov 5, 2024

github-project-automation bot moved this to Done in Explicit Parameter Transition Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate Flux.Optimisers and implicit parameters in favour of Optimisers.jl and explicit parameters #1986

Deprecate Flux.Optimisers and implicit parameters in favour of Optimisers.jl and explicit parameters #1986

CarloLucibello commented Jun 2, 2022 •

edited by mcabbott

Loading

ToucheSir commented Jun 3, 2022

darsnack commented Jun 3, 2022

DrChainsaw commented Jun 11, 2022

ToucheSir commented Jun 12, 2022

DrChainsaw commented Jun 13, 2022

ToucheSir commented Jun 14, 2022

mcabbott commented Jul 27, 2022

DhairyaLGandhi commented Jul 27, 2022

DrChainsaw commented Jul 28, 2022

ToucheSir commented Jul 30, 2022 •

edited

Loading

ToucheSir commented Jul 30, 2022

DrChainsaw commented Jul 30, 2022

CarloLucibello commented Jul 9, 2023

mcabbott commented Jul 10, 2023

darsnack commented Jul 10, 2023

Qfl3x commented Aug 21, 2024

ToucheSir commented Aug 21, 2024

Deprecate Flux.Optimisers and implicit parameters in favour of Optimisers.jl and explicit parameters #1986

Deprecate Flux.Optimisers and implicit parameters in favour of Optimisers.jl and explicit parameters #1986

Comments

CarloLucibello commented Jun 2, 2022 • edited by mcabbott Loading

Code

Documentation

Examples

ToucheSir commented Jun 3, 2022

darsnack commented Jun 3, 2022

DrChainsaw commented Jun 11, 2022

ToucheSir commented Jun 12, 2022

DrChainsaw commented Jun 13, 2022

ToucheSir commented Jun 14, 2022

mcabbott commented Jul 27, 2022

DhairyaLGandhi commented Jul 27, 2022

DrChainsaw commented Jul 28, 2022

ToucheSir commented Jul 30, 2022 • edited Loading

ToucheSir commented Jul 30, 2022

DrChainsaw commented Jul 30, 2022

CarloLucibello commented Jul 9, 2023

mcabbott commented Jul 10, 2023

darsnack commented Jul 10, 2023

Qfl3x commented Aug 21, 2024

ToucheSir commented Aug 21, 2024

CarloLucibello commented Jun 2, 2022 •

edited by mcabbott

Loading

ToucheSir commented Jul 30, 2022 •

edited

Loading