Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable optimisations with Chain #2004

Open
theabhirath opened this issue Jun 22, 2022 · 8 comments
Open

Enable optimisations with Chain #2004

theabhirath opened this issue Jun 22, 2022 · 8 comments

Comments

@theabhirath
Copy link
Member

theabhirath commented Jun 22, 2022

In long Chains in Metalhead, it is often the case that there are layers that can be reduced to identity - Dropout(p = 0) is a frequent occurrence, along with some other similar regularisation layers (DropBlock, DropPath). Currently, according to Lux's documentation, there is an option to enable and disable optimisations that can remove these and make the model a little cleaner to go through. Is there a chance something similar can be implemented for Flux?

@theabhirath
Copy link
Member Author

Speaking of Lux's documentation, it looks absolutely beautiful - any chance Flux would consider using a different theme?

@darsnack
Copy link
Member

We could transition to Pollen

@darsnack
Copy link
Member

darsnack commented Jun 22, 2022

As far as optimization goes, I don't think the Lux optimizations do what you're proposing. Instead, they recursively go through the Chain to wrap functions that don't adhere to the Lux interface and to delete NoOpLayers. Both of which aren't issues for Flux models.

EDIT: Okay, I see in Lux that Dropout constructors return a NoOpLayer which gets pruned by the optimization. We could do a similar thing for identity as I mentioned below. Again, what's the benefit beyond visually making the model simpler?

That being said, it wouldn't be too difficult to build the kind of optimization you're talking about using fmap (we wouldn't want the keyword interface that's limited to Chain like Lux). The main question is what kind of benefit it provides. If most of these no-ops can be optimized by the compiler itself, then this kind of optimization pass isn't super useful. Though I wouldn't be surprised if it gave a benefit on the backwards pass due to Zygote. Maybe a comparison of a manually optimized vs. un-optimized model from Metalhead would be good to have first.

@theabhirath
Copy link
Member Author

theabhirath commented Jun 22, 2022

Again, what's the benefit beyond visually making the model simpler?

I'll try and benchmark to see if there's a difference. But among other things, it makes porting weights from other libraries easier despite offering a little more functionality in lieu of pre-trained weights if the user wants

@avik-pal
Copy link
Member

Just to answer a few points raised here:

  1. The optimization pass is necessary for Lux since it requires layers to follow a particular interface. For Flux, it makes little sense since it doesn't require a strict interface.
  2. No-ops would ideally be optimized away. But Zygote being zygote keeps them around. Becomes worse for Dropout where it keeps branching around 😞. Though in most real-world use cases, it makes very little difference. It shows up if your model is reasonably small.

@ToucheSir
Copy link
Member

2. But Zygote being zygote keeps them around. Becomes worse for Dropout where it keeps branching around

On this point, see #2005. I still think making active a type parameter as you did is cleaner, but we don't have that luxury here.

@theabhirath
Copy link
Member Author

theabhirath commented Jun 23, 2022

Another optimisation that caught my eye was the flattening of nested Chains. Would that be something that would maybe help with TTFG and the backward pass times? (not sure if this is already done internally somehow, though)

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Jun 23, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants