-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Highlight update!
API more to attract DL researchers
#2104
Comments
My current take on the README example is: "Here is a bunch of complex things we can do with very little code," but this is:
I think the quickstart example should be:
I think for these reasons, it would be really nice if the example was simple and used the |
The example was added in #2067. I'm personally in favour of removing |
I see, thanks! I would also change this example: https://fluxml.ai/Flux.jl/stable/models/quickstart/ to include Edit: actually, maybe it's okay to use a dataloader in the quickstart example, so long as the looping is explicit. |
(I think I confused the quickstart and readme pages from when I first checked out this package… I do remember seeing a |
FWIW, most of the current maintainers do not like |
Worth adding here that we have this ML call every other week and it's open to anyone, so if you're interested in talking about docs work or anything else feel free to drop in :) |
Awesome, thanks for sharing this update! I think that is an awesome initiative and would be well-appreciated by the community! |
Welcome, and glad you persisted! I made these examples recently. The goals I suppose were:
Both can surely be better. Want to have a go tweaking the quickstart example to avoid I would vote to keep it with implicit I would also vote for it to generate data outside the loop, as this is a bit more realistic. Demonstrating that DataLoader is something which takes & gives matrices also seemed like a good idea. I think it's important that it not just push random numbers through, but solve some problem, however simple. (When I run the loop above, the loss doesn't decline, and there's nothing I can plot afterwards.) |
Here's a tweaked README example. As a longtime PyTorch and JAX user, the following syntax feels very intuitive for me, I feel like I could understand it while being new to Julia. It's both not intimidating, and would make it easier for me to start tweaking various steps and modifying it to my own use case: using Flux
# We wish to learn this function:
f(x) = cos(x[1] * 5) - 0.2 * x[2]
# Generate dataset:
n = 10000
X = rand(2, n) # In Julia, the batch axis is last!
Y = [f(X[:, i]) for i=1:n]
Y = reshape(Y, 1, n)
# Move to GPU
X = gpu(X)
Y = gpu(Y)
# Create dataloader
loader = Flux.DataLoader((X, Y), batchsize=64, shuffle=true)
# Create a simple fully-connected network (multi-layer perceptron):
n_in = 2
n_out = 1
model = Chain(
Dense(n_in, 32), relu,
Dense(32, 32), relu,
Dense(32, 32), relu,
Dense(32, n_out)
)
model = gpu(model)
# Create our optimizer:
optim = Adam(1e-3)
p = Flux.params(model)
# Let's train for 10 epochs:
for i in 1:10
losses = []
for (x, y) in loader
# Compute gradient of the following code
# with respect to parameters:
loss, grad = Flux.withgradient(p) do
# Forward pass:
y_pred = model(x)
# Square error loss
sum((y_pred .- y) .^ 2)
end
# Step with this gradient:
Flux.update!(optim, p, grad)
# Logging:
push!(losses, loss)
end
println(sum(losses)/length(losses))
end
using Plots
# Generate test dataset:
Xtest = rand(2, 100)
Ytest = mapslices(f, Xtest; dims=1) # Alternative syntax to apply the function `f`
# View the predictions:
Ypredicted = model(Xtest)
scatter(Ytest[1, :], Ypredicted[1, :], xlabel="true", ylabel="predicted") |
PR in #2108 |
I think this loop is exactly what we want in the quickstart:
But I do not think the readme example should be as long as the quickstart one. We already have a problem with there being too many entry points, and I would like anyone reading a 30 lines to already be on a page of the docs (not the website tutorials, and not the readme). More later. |
I think generally it is good to keep the quickstart like a mini-tutorial while still being general enough so users can think about how to modify to their use-cases. So, in retrospect, I changed my mind and now agree with you that the dataloader is good to include! I think many ML practitioners have very short attention spans - people will literally copy the quickstart example, try to hack it for their use-case using only trial-and-error, and never once read the docs, and quit if they can't figure it out. But once you "hook" them, and they can get something working for their use-case, then they will be much more likely to search around the docs pages to do something specific. |
I think the first code example a user sees is the one they will assume to be the quickstart. So perhaps if the goal is to move them to the docs pages quickly, then I would just remove the code example from the README altogether. (When I was trying Flux.jl yesterday, the README example acted as my quickstart tutorial - I didn't even look at the quickstart page at first). |
I think the
update!
API should be presented up-front in addition, or instead of, theFlux.train!
API. This will help significantly with attracting deep learning researchers who I see as the bridge to wider adoption.Motivation. I first encountered FluxML.jl maybe ~1.5 years ago. At the time, I skimmed the docs, saw this
Flux.train!
API on the currentREADME.mdquickstart page, and wrote off the entire package as being another one of those super high-level deep learning libraries - one where it's easy to write things in the high-level API but nearly impossible to tweak the internals. (Many others out there might do the same quick first impressions evaluation, even though a package maintainer's dream is that every user read all the docs.)Today, I decided to take another look through the docs in more detail: I wanted to find something equivalent to what PyTorch and JAX deep learning frameworks have in that you can work directly on gradient updates and parameters. (This is important for many areas of deep learning research, as I am sure you know!)
I found the
update!
API (andwithgradient
) after a lot of digging through the docs. I am really happy with this API, as it gives me the low-level control over my deep learning models that I need for my research! So now I am actually planning to use FluxML for research.Conclusion. It took me two passes at the docs, the second one very deep, before I actually found this API. Even after I found it, I only found the API reference for
update!
, rather than an easy-to-find example I could copy and start working with. This user experience is something that might lose potential users.Proposal. Therefore, I propose that the
update!
API be demonstrated in the quick start example: both on the README, and up front in the documentation. I think this is really key to attract deep learning researchers as users, as the most popular deep learning packages by default expose this slightly lower-level API. It needs to be extremely obvious that one can do a similar thing with Flux.jl!Here's an example I propose, which is similar to the style of PyTorch training loops (and so is a great way to convert some PyTorch users!):
The text was updated successfully, but these errors were encountered: