Enable optimisations with `Chain` #2004

theabhirath · 2022-06-22T06:07:05Z

In long Chains in Metalhead, it is often the case that there are layers that can be reduced to identity - Dropout(p = 0) is a frequent occurrence, along with some other similar regularisation layers (DropBlock, DropPath). Currently, according to Lux's documentation, there is an option to enable and disable optimisations that can remove these and make the model a little cleaner to go through. Is there a chance something similar can be implemented for Flux?

The text was updated successfully, but these errors were encountered:

theabhirath · 2022-06-22T06:09:09Z

Speaking of Lux's documentation, it looks absolutely beautiful - any chance Flux would consider using a different theme?

darsnack · 2022-06-22T10:31:09Z

We could transition to Pollen

darsnack · 2022-06-22T12:49:07Z

As far as optimization goes, I don't think the Lux optimizations do what you're proposing. Instead, they recursively go through the Chain to wrap functions that don't adhere to the Lux interface and to delete NoOpLayers. Both of which aren't issues for Flux models.

EDIT: Okay, I see in Lux that Dropout constructors return a NoOpLayer which gets pruned by the optimization. We could do a similar thing for identity as I mentioned below. Again, what's the benefit beyond visually making the model simpler?

That being said, it wouldn't be too difficult to build the kind of optimization you're talking about using fmap (we wouldn't want the keyword interface that's limited to Chain like Lux). The main question is what kind of benefit it provides. If most of these no-ops can be optimized by the compiler itself, then this kind of optimization pass isn't super useful. Though I wouldn't be surprised if it gave a benefit on the backwards pass due to Zygote. Maybe a comparison of a manually optimized vs. un-optimized model from Metalhead would be good to have first.

theabhirath · 2022-06-22T13:02:12Z

Again, what's the benefit beyond visually making the model simpler?

I'll try and benchmark to see if there's a difference. But among other things, it makes porting weights from other libraries easier despite offering a little more functionality in lieu of pre-trained weights if the user wants

avik-pal · 2022-06-23T04:51:29Z

Just to answer a few points raised here:

The optimization pass is necessary for Lux since it requires layers to follow a particular interface. For Flux, it makes little sense since it doesn't require a strict interface.
No-ops would ideally be optimized away. But Zygote being zygote keeps them around. Becomes worse for Dropout where it keeps branching around 😞. Though in most real-world use cases, it makes very little difference. It shows up if your model is reasonably small.

ToucheSir · 2022-06-23T05:21:32Z

2. But Zygote being zygote keeps them around. Becomes worse for Dropout where it keeps branching around

On this point, see #2005. I still think making active a type parameter as you did is cleaner, but we don't have that luxury here.

theabhirath · 2022-06-23T06:31:22Z

Another optimisation that caught my eye was the flattening of nested Chains. Would that be something that would maybe help with TTFG and the backward pass times? (not sure if this is already done internally somehow, though)

DhairyaLGandhi · 2022-06-23T06:39:14Z

Not really. The short templated chains means we can often use the compiled gradient code across a model often

…

On Thu, Jun 23, 2022, 12:01 Abhirath Anand ***@***.***> wrote: Another optimisation that caught my eye was the flattening of nested Chains. Would that be something that would maybe help with TTFG and the backward pass times? — Reply to this email directly, view it on GitHub <#2004 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJOZVVJ3I5MEYLD3FBUVHFLVQQAEVANCNFSM5ZO6T5OQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

theabhirath mentioned this issue Jun 23, 2022

Overhaul of ResNet API FluxML/Metalhead.jl#174

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable optimisations with `Chain` #2004

Enable optimisations with `Chain` #2004

theabhirath commented Jun 22, 2022 •

edited

Loading

theabhirath commented Jun 22, 2022

darsnack commented Jun 22, 2022

darsnack commented Jun 22, 2022 •

edited

Loading

theabhirath commented Jun 22, 2022 •

edited

Loading

avik-pal commented Jun 23, 2022

ToucheSir commented Jun 23, 2022

theabhirath commented Jun 23, 2022 •

edited

Loading

DhairyaLGandhi commented Jun 23, 2022 via email

Enable optimisations with Chain #2004

Enable optimisations with Chain #2004

Comments

theabhirath commented Jun 22, 2022 • edited Loading

theabhirath commented Jun 22, 2022

darsnack commented Jun 22, 2022

darsnack commented Jun 22, 2022 • edited Loading

theabhirath commented Jun 22, 2022 • edited Loading

avik-pal commented Jun 23, 2022

ToucheSir commented Jun 23, 2022

theabhirath commented Jun 23, 2022 • edited Loading

DhairyaLGandhi commented Jun 23, 2022 via email

Enable optimisations with `Chain` #2004

Enable optimisations with `Chain` #2004

theabhirath commented Jun 22, 2022 •

edited

Loading

darsnack commented Jun 22, 2022 •

edited

Loading

theabhirath commented Jun 22, 2022 •

edited

Loading

theabhirath commented Jun 23, 2022 •

edited

Loading