Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify test machinery #2498

Merged
merged 12 commits into from
Oct 13, 2024
Merged

simplify test machinery #2498

merged 12 commits into from
Oct 13, 2024

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Oct 12, 2024

This PR introduces a single function test_gradients that can be used across all possible differentiation scenarios (on cpu and any gpu backend, on layer or simple functions...).

@CarloLucibello CarloLucibello marked this pull request as draft October 12, 2024 15:47
@CarloLucibello
Copy link
Member Author

@pxl-th I'm seeing something odd with the convolutions on AMDGPU
Could you test locally the following example? Looks like both assertions fail:

using Flux, AMDGPU, Zygote

m = Conv((2, 2), 3 => 4)
x = rand(Float32, 5, 5, 3, 2)

y, g = Zygote.withgradient(x -> sum(m(x)), x)

dev = gpu_device(force=true)    
m_gpu = m |> dev
x_gpu = x |> dev
y_gpu, g_gpu = Zygote.withgradient(x -> sum(m_gpu(x)), x_gpu)

@assert y_gpu  y  atol=1e-4
@assert Array(g_gpu)  g  atol=1e-4

@CarloLucibello CarloLucibello marked this pull request as ready for review October 13, 2024 12:10
@pxl-th
Copy link
Member

pxl-th commented Oct 13, 2024

@CarloLucibello something broke cpu-gpu transfer for convolutions.
MIOpen only supports cross-correlation, so we need to transpose filters when moving to gpu.

This is not executed anymore:

function Adapt.adapt_structure(to::FluxAMDGPUAdaptor, m::CPU_CONV)

@CarloLucibello
Copy link
Member Author

Ah right, that is when using MLDataDevices.gpu_device. Flux.gpu should work fine instead.
I'll try to fix it.

@CarloLucibello
Copy link
Member Author

I investigated a bit and hooking up into MLDataDevices's data transfer mechanism is not straightforward. So I propose we merge this PR as it is and proceed with a fix later. The problem is not introduced by this PR in any case, it was pre-existing and now just exposed.

Can I get an approve?

Copy link
Member

@pxl-th pxl-th left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can take a look at it as well tomorrow, we can fix this is a separate PR.

@CarloLucibello CarloLucibello merged commit 35b893a into master Oct 13, 2024
6 of 9 checks passed
@CarloLucibello
Copy link
Member Author

I think the solution involves modifying this line
https://github.com/LuxDL/MLDataDevices.jl/blob/71ed455bb2a898a128b32aec0a67ba44fe8321d7/src/public.jl#L340
to use some custom and publicly exposed isleaf that we can hook into from Flux. Cc @avik-pal

@CarloLucibello CarloLucibello deleted the cl/checkgrad branch October 13, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants