Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post proposal: "Modern Julia workflows" #1908

Open
gdalle opened this issue Jun 14, 2023 · 43 comments
Open

Blog post proposal: "Modern Julia workflows" #1908

gdalle opened this issue Jun 14, 2023 · 43 comments

Comments

@gdalle
Copy link
Contributor

gdalle commented Jun 14, 2023

EDIT: the blog post is being created!


Is your feature request related to a problem? Please describe.

AFAIK, the typical tools and packages a Julia user needs daily are not documented in a single place.

Describe the solution you'd like

A lengthy blog post detailing the typical workflow for using, developing and testing packages. See structure proposal below.

Describe alternatives you've considered

  • Adding things to the documentation (but where?)
  • Redesign the front page (but how long would it take?)

Additional context

Related issues:

Possible contributors:

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

Possible structure of the post:

Step 1: writing code

Installation

  • official downloads
  • juliaup

Code loading

  • startup file
  • Revise.jl

Development environments

  • VSCode / VSCodium
  • emacs, vim
  • Jupyter
  • Pluto.jl

Package management

  • Pkg.jl
  • stacking environments
  • VSCode extension

Debugging

  • Debugger.jl
  • VSCode extension

Esthetics

  • Term.jl
  • OhMyREPL.jl

Calling other languages

  • PythonCall.jl
  • RCall.jl
  • JuliaInterOp

Experiments

  • ProgressMeter.jl
  • DrWatson.jl

Step 2: sharing code

Setup

  • Git(Hub)
  • PkgTemplates.jl

Formatting

  • style guides (BlueStyle, SciMLStyle)
  • JuliaFormatter.jl

Testing

  • unit tests
  • Aqua.jl
  • TestEnv.jl
  • VSCode extension (TestItemRunner.jl)

Documentation

  • docstrings
  • Documenter.jl
  • LiveServer.jl
  • Literate.jl
  • Quarto

Compatibility

  • package extensions
  • Requires.jl
  • PkgCompatUI.jl

Publishing

  • Discourse post
  • general registry
  • LocalRegistry.jl

Step 3: speeding up code

Speed measurements

  • BenchmarkTools.jl

Profiling

  • built-in (time + memory)
  • VSCode extension
  • ProfileView.jl / ProfileSVG.jl

Type stability

  • JET.jl
  • Cthulhu.jl

Precompilation

  • PrecompileTools.jl
  • PackageCompiler.jl

Parallelism?

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

What would you add?

@fredrikekre
Copy link
Member

LiveServer.jl, in particular the live-building of documentation, is a game changer for writing documentation. The instant reward of seeing the rendered page is making it so much more fun.

@albheim
Copy link

albheim commented Jun 14, 2023

Something I've found really useful since I first found it was project environments and global environments, and how they stack. So the global environment can contain tools that are commonly useful but you might not always want to add to a project env, and can be seen a bit like an extended stdlib with what you personally think should always be available :)

I think this is a great feature and use it often both as a user and package developer.

@nilshg
Copy link

nilshg commented Jun 14, 2023

What about PackageCompiler? It should probably be mentioned for users (eliminate ttfx) and for developers (deploying apps)

@jacobusmmsmit
Copy link

Thanks for opening this discussion, Guillaume, I think it's an important collection of information that we can point newer users to! I really believe that centralising information like this is one of the best things that we can do as a community. One of the biggest problem we want to avoid is the feeling that there are some Julia wizards that know these magical incantations to make the language 100x better but the knowledge is stored in their head or their cryptic documentation.

I do believe that the post shouldn't be split up into three levels. We want to present a curated list of workflow solutions to common problems or questions, as in my video I think it's better to present all of the solutions together and qualify them with how powerful or useful they are to whom and when to use them. Also, Julia really blurs the line between scripter, package developer, and optimiser such that I don't think we should hint that they are separate things.

I do like how each package is part of a subheading talking about a specific topic, but I think some of the most important (commonly used according to my survey) packages are under headings that most people wouldn't think to read unless they were more experienced with Julia. For example, not knowing about JET and Cthulhu would be a real shame! Perhaps we should present certain tools as helping you write idiomatic Julia code as everyone wants to do that.

I think there's also a danger of presenting Julia as some sort of monolithic beast of packages that are required to be used to have a nice developer experience. This may be the impression that some newer folks take from such a post.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

LiveServer.jl, in particular the live-building of documentation, is a game changer for writing documentation.

Thanks, I just added it!

Something I've found really useful since I first found it was project environments and global environments, and how they stack.

Good point, also in the list now!

What about PackageCompiler? It should probably be mentioned for users (eliminate ttfx) and for developers (deploying apps)

Indeed, but I tried to structure the list by difficulty, and in my mind it is a rather advanced tool. Probably cause I don't use it myself 🤷

@nilshg
Copy link

nilshg commented Jun 14, 2023

I don't use it either, but it always felt like something that might actually be useful to me - 90% of my time on Julia is spent data wrangling and doing statistical modelling, simulation or optimization using the same 10 packages, so theoretically I think having a custom sysimage could be great but I could never be bothered to try it.

As far as I know it's one line of code to create a sysimage, maybe I should try it and see how practical it is before suggesting it 😂

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

One of the biggest problem we want to avoid is the feeling that there are some Julia wizards that know these magical incantations to make the language 100x better but the knowledge is stored in their head or their cryptic documentation.

You read my mind.

I do believe that the post shouldn't be split up into three levels. [...] Julia really blurs the line between scripter, package developer, and optimiser such that I don't think we should hint that they are separate things.

That is a valid remark, and this user progression is something I definitely want to encourage. However, even with access to a blog post like this, it has to happen gradually. Beginners will need to master Revise and Pkg long before they even look at Cthulhu or PrecompileTools. And it seems a bit daunting to give them everything at once, especially with a very flat hierarchical structure (maybe we could find other natural headings?).

I think some of the most important (commonly used according to my survey) packages are under headings that most people wouldn't think to read unless they were more experienced with Julia. For example, not knowing about JET and Cthulhu would be a real shame!

I would actually love to know the results of your survey in terms of package use statistics, maybe you could share them here?
My putting JET and Cthulhu at the end is a subjective choice, mainly due to the order in which I discovered things myself. Both have greatly improved in usability recently (Cthulhu mapping directly to source code is a game changer), so there is a case to be made for mentioning them earlier. However, my guess is that Julia beginners take things in the following order:

  1. Write code
  2. Share code
  3. Improve code

I'm open to being proven wrong, but if I'm not, then JET and Cthulhu belong in part 3

I think there's also a danger of presenting Julia as some sort of monolithic beast of packages that are required to be used to have a nice developer experience. This may be the impression that some newer folks take from such a post.

Agreed, when I wrote it down this morning I thought "boy that list is scary long".
That's part of why I think we need a sense of progression and increasing difficulty. I am also considering a series of 3 blog posts for that very reason.

@jacobusmmsmit
Copy link

As far as I know it's one line of code to create a sysimage, maybe I should try it and see how practical it is before suggesting it 😂

It's not hard persay, in my video I show how to use it. But it is very fickle: some things won't compile no matter how hard you try, and knowing about incremental sysimages is a game changer. There's also vscode's built in functionality for generating sysimages, but I don't like it so much as it both didn't work for me when I tried to use it and doesn't allow for much control nor understanding of how the process works (useful for debugging it).

I would actually love to know the results of your survey in terms of package use statistics, maybe you could share them here?

Here you go: https://discourse.julialang.org/t/survey-on-how-you-use-julia/99807/6

I did a small writeup. The part that surprised me was just how used LocalRegistry was, more so than JET, Debugger, and Cthulhu.

my guess is that Julia beginners take things in the following order

I think that's a reasonable way of putting it, I also like the framing of "Write, Share, Improve". It might be useful if we made it clear that the first section/post is to get you up and writing/running whatever code you want in a structured way, and the third is about making the code itself better.

The main reason I want JET (and to a lesser extent Cthulhu as I see it as more advanced) to be made more prominent is because many "senior" (for lack of a better word) members of the community have expressed a desire to see it adopted more or that it may one day be a more integrated part of the language.

By making this blog post we have a lot of influence over how people learn Julia and think about writing Julia code. I want the future of Julia to be statically checked by JET! I want compile times to go down because instabilities and piracy are caught!

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

By making this blog post we have a lot of influence over how people learn Julia and think about writing Julia code. I want the future of Julia to be statically checked by JET! I want compile times to go down because instabilities and piracy are caught!

Maybe we could frame them as debugging tools instead of performance optimization. That way we mention them earlier and trick beginners into believing JET and Cthulhu are already the standard for tracking problems in your code. Then type-stable Julia becomes a self-realizing prophecy

@jacobusmmsmit
Copy link

Well, they kinda are debugging tools. Python has PEP8 checkers as a standard part of the workflow as well as things like isort. To me they come even before debugging tools, they should always be there in the background reminding you of small changes to your code to improve it.

I wish JET could be used as a passive static analyser/linter as opposed to having to be called actively to find errors. That's the part that holds it back imo.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

That makes sense, I use JET and Cthulhu much more than the standard debugger anyway (yes I'm a println("here") kinda guy)

@jacobusmmsmit
Copy link

Oh wait it can and is used in that way as a linter.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

True but you still need to run your functions with a macro

@DanielVandH
Copy link

I am also considering a series of 3 blog posts for that very reason.

Definitely agree that this should be broken up. If I were still a new user, I would be afraid of ever getting into package development if everything was presented all at once like this.

For "Calling other languages", would be good to also just link to the interop org https://github.com/JuliaInterop. I used to use RCall.jl a lot when I first started, so making all the other packages beyond PythonCall.jl discoverable would be ideal (I imagine that was your intent eventually anyway, but just wanted to highlight the org link).

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

For "Calling other languages", would be good to also just link to the interop org https://github.com/JuliaInterop. I used to use RCall.jl a lot when I first started, so making all the other packages beyond PythonCall.jl discoverable would be ideal (I imagine that was your intent eventually anyway, but just wanted to highlight the org link).

Of course! I mentioned PythonCall.jl specifically because many beginner want to use PyCall.jl instead, which is basically made obsolete by PythonCall.jl

@mrufsvold
Copy link

I think to answer the question of hierarchy, we need to decide what kind of documentation this is. It seems to be straddling all four quadrants at the moment.

I think a horizontal "menu" of awesome packages would be excellent informational documentation. It could be supplemented with Tutorials for Revise, Debugging, and Notebooks, as well as
How To Guides for intermediate users for stuff like PrecompileTools and PackageCompiler.

I think mixing purposes will muddy the usefulness for any individual reader.

Breaking things out like this is more work than the single blog post in the OP, but it would let us approach it incrementally.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 14, 2023

I think to answer the question of hierarchy, we need to decide what kind of documentation this is. It seems to be straddling all four quadrants at the moment.

Good point. Let's go through the divio categories:

  • In my view we can exclude the explanation straight away. The purpose is not, for instance, to dive into Julia's internals and explain how to improve performance, otherwise we're biting off more than we can chew.
  • If we published the list in its current format with a few lines of description per package, I would categorize it as a reference. But it would be a bit dry.
  • A tutorial would be very valuable but require a lot more work from us, and probably exceed any acceptable length. Plus it wouldn't be very practical for people who want to refer to a specific point.
  • In the end, I think a how-to guide is most appropriate. We could have very concrete questions as headings, like "How do I make sure my code fits basic quality standards", and then a presentation and demo of Aqua.jl.

@jacobusmmsmit
Copy link

I believe that the video(s) I'm planning on making fit the explanation/tutorial side of that chart quite well. If nothing else, this does mean that content of this sort will be available for people to find.

I think our post should be full of links to learn more, so it can act as a jumping off point instead of a reference or explanation, but I'm not sure that a how-to guide is so appropriate. In my head this post is definitely learning-oriented at least somewhat, and so I think perhaps a hybrid how-to/tutorial is appropriate. I don't think we need to be 100% detailed and provide a concrete list of steps for everything we talk about, but some details to help people get started such as an example config for startup.jl.

@jkrumbiegel
Copy link
Contributor

For debugging, @show is my most used "tool" I think (one step up from println ;) ). A couple of these convenience macros deserve highlighting I think, especially if people come from languages without macros. I have once or twice used Infiltrator.jl as well, because the debuggers didn't work well enough and I just wanted to inspect local variables here and there.

@jacobusmmsmit
Copy link

@show and some pretty println string interpolation goes a long way.

In one of Chris R's videos he covered "catching the value that caused an error" using some neat global/ref trick that would also be good to cover

@timholy
Copy link
Member

timholy commented Jun 14, 2023

In one of Chris R's videos he covered "catching the value that caused an error" using some neat global/ref trick that would also be good to cover

const _args = Ref{Any}()
function foo(arg1, args...)
    _args[] = deepcopy((arg1, args...))
    # implementation
end

The deepcopy is needed only for foo! (one which modifies input arguments).

(That's actually what Rebugger does automatically to each item in a stacktrace, but it's not maintained currently.)

@jlapeyre
Copy link

jlapeyre commented Jun 16, 2023

  1. @gdalle I think you are saying that you want to separate the beginner-friendly things from the more advanced stuff. I think this is a great idea. Maybe a beginner just wants a minimal recipe or two for being comfortable when trying to code a project in Julia. So Pkg and Revise (or the VSCode equivalent) are essential. Cthulu (I refuse to look up how to spell that correctly!) is not essential for beginners. It doesn't need to be reserved for advanced users, but at least one level beyond beginner. It's easy to be overwhelmed.

  2. Also, I use JET in testing and CI. In other words, I would group it with Aqua. The interface is a bit rough, but it can catch regressions in design. You want to catch this before you have erected too much on top of the problematic code. On the other hand, I do this by copying other scripts and handrolling scripts to filter out what I want to consider a false positive. JET is not ready for use by beginners in this capacity. (My intention is to use it always and include a JET badge (borrowed from S Krastanov) in my repos in order to promote normalizing it's use.)

  3. You analysis of the divio categories is good. What brings the most benefit for the least effort.

@alfaromartino
Copy link

One feature that I'd find worthy is a list with officially supported packages (if there's a thing like that). Given that Base only includes a set of minimal functions, the user needs to rely on packages. It'd be nice to know what packages are developed by the creators of Julia or are core packages actively developed by the community.

Examples of the packages I'm referring to are Statistics.jl, DataFrames.jl, StaticArrays.jl, etc.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 19, 2023

One feature that I'd find worthy is a list with officially supported packages

In a way, the blog post we're currently discussing would provide such a list, but strictly restricted to developer workflows.

There has been a lot of discussion on Discourse recently about an alternative to the general registry with a more curated package list, for instance enforcing certain quality guarantees. I personally welcome these initiatives, but I'm not sure adding a package list to an already extensive blog series centered around methodology makes much sense. Thoughts from others on this?

@jacobusmmsmit
Copy link

jacobusmmsmit commented Jun 19, 2023

As Guillaume said, this list doesn't exist which is part of the reason we are looking to make this blog.

I want to make package developers and users alike more aware that Aqua is an important tool for this. While it doesn't guarantee that a package has the perfect feature set, it does provide an effortful signal of quality.

One thing that it doesn't address is ongoing support, but I don't know how this can be guaranteed. The nature of most Julia projects is small in number of people working on it (normally one), and timeframe that this person is invested into it.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 19, 2023

Perhaps we could add some pointers as to where high-quality packages can be found and how to assess them:

  • GitHub orgs
  • JuliaHub search
  • criteria such as commit date and nb of stars
  • etc.

@alfaromartino
Copy link

Rather than high-quality packages, I was referring to officially supported packages. By this, I mean hat the developers don't want to add these functionalities to Base to keep their development separate, but they should be considered "almost" as part of Julia. The best example is perhaps Statistics, but it includes others like Distributions. They're "optional" packages, but at the same time I can imagine that if Distributions is not maintained anymore by the original developers, there'd be official support to keep it alive and maintained. It's the type of packages that so many packages use as a dependency, that it'd break the ecosystem.

Sometimes I'm not sure if I should use some packages, because I don't know if they're this type of core packages. Examples are InlineStrings and PooledArrays, which I assume they're officially supported because packages like DataFrames are using.

In summary, I'd emphasize on "core packages" and "useful high-quality packages" when the list is described.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 19, 2023

Rather than high-quality packages, I was referring to officially supported packages. By this, I mean hat the developers don't want to add these functionalities to Base to keep their development separate, but they should be considered "almost" as part of Julia.

I'm honestly not sure that these packages exist, for the same reason that there is no "official" organization behind Julia. As stated in this blog post:

The Julia project [...] consists of some code and a community of people who work on that code. The most clear cut line that can be drawn is that there is a set of people who have commit access to the JuliaLang GitHub organization [...] This set of people doesn’t really define the project, however, since there are many people who are prolific contributors to the Julia ecosystem but who do not have “commit bit.” The communal nature of open source makes it difficult to precisely define where the Julia project ends and the greater community begins, which is exactly how we like it.

I can imagine that if Distributions is not maintained anymore by the original developers, there'd be official support to keep it alive and maintained.

For the same reason as above, this seems inaccurate to me. If Distributions.jl or DataFrames.jl were no longer maintained, there would be community initiatives to take over, or maybe these package would be deprecated and replacements would emerge.
"Core packages" like those you mention are great, and used by many, but I don't think anyone would claim they are "official", or "nearly part of Julia". On the other hand, such claims are much more justified for tools like Revise.jl or Pkg.jl, which are precisely the focus of my blog post proposal.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 21, 2023

Update: the blog posts are being drafted on a separate repo, and we'll make a PR to the official website once they're ready.

Preview: https://gdalle.github.io/ModernJuliaWorkflows/
Repo: https://github.com/gdalle/ModernJuliaWorkflows
Progress: https://github.com/gdalle/ModernJuliaWorkflows/issues

@lassepe
Copy link

lassepe commented Jun 25, 2023

I really appreciate the initiative here. I think that such a blog post would be a great resource that I have long searched for when showing Julia to new students.

To chip in my two cents, to me, the single most useful debugging tool is Infiltrator. However, the workflow is a bit subtle:

  1. Installation: ]add Infiltrator in your global environment
  2. Usage in a specific (non-shared) project (i.e. after ]activate path/to/your/env)
    a. Load Infiltrator via using Infiltrator
    b. At the place in your code where you need to debug, write Main.Infiltrator.@infiltrate to set a "breakpoint". You can also make this breakpoint conditional, e.g. Main.Infiltrator.@infiltrate any(isnan, my_variable).
    c. When you hit the break point, your REPL will "stop" in the corresponding local scope and you can interact with all variables visible in that scope for debugging. For longer debugging sessions (or when you try to capture offending inputs for a reproducer) you can use @exfiltrate my_variable to make it accessible as Infiltrator.store.my_variable from the REPL. @exifltrate is essentially a streamlined version of Tim's trick above.
    d. Hit CTRL+D to exit infiltrator mode (e.g. continue continue after the breakpoint)
    e. In order to reset the breakpoint states (e.g. if you used @skip to skip a break point and you want to "unskip" it) call Infiltrator.end_session!()

Note that this workflow allows you to use Infiltrator without installing it to your local project; you only ever install it to your global "devtool" environment

  1. using Infiltrator in the local project works since we exploit the fact that environments are stacked (so we can load it in the REPL without it being added to the local Project.toml)
  2. Main.Infiltrator.@infiltrate accesses the breakpoint macro through the Main module---the outer module that is implicitly spawned when you start the REPL.

For these instructions to make sense to a new user, the blog post would have to lead with some details on global vs local environments (and the fact that they are stacked).

@jkrumbiegel
Copy link
Contributor

One thing that is very confusing for new users (and in general annoying) is that Infiltrator and other tools that work with standard input don't work right when started in VSCode via block execution. Then the input/output gets messed up and all jumbled. It only works if you execute the command directly in the REPL. It's not really recoverable either.

@gdalle
Copy link
Contributor Author

gdalle commented Jun 25, 2023

To chip in my two cents, to me, the single most useful debugging tool is Infiltrator. However, the workflow is a bit subtle:

Thanks @lassepe! I only recently discovered Infiltrator.jl, and I have added it to the list on the draft website.

For these instructions to make sense to a new user, the blog post would have to lead with some details on global vs local environments (and the fact that they are stacked).

That is definitely on the roadmap too, in the first few sections

@gdalle
Copy link
Contributor Author

gdalle commented Jun 25, 2023

Infiltrator and other tools that work with standard input don't work right when started in VSCode via block execution. Then the input/output gets messed up and all jumbled.

@jkrumbiegel what do you mean by jumbled? Do you have examples of other tools that fail in this way? Cthulhu is the only one I know

@jacobusmmsmit
Copy link

I can attest to what @jkrumbiegel is saying about VSCode mangling the commands if you send it to the terminal via run block. I have to copy/paste Cthulhu.@descend as well.

@jkrumbiegel
Copy link
Contributor

Interesting, I just tried again and now it throws an error that you shouldn't run it on async code. But if you disable that functionality according to the error message you can still see how it fails (I'm trying to print x but it doesn't work most of the time and just swallows that input)

using Infiltrator

function f()
    x = 1
    @infiltrate
    return x
end

f()
Screen.Recording.2023-06-26.at.09.58.26.mov

@adrhill
Copy link

adrhill commented Jun 28, 2023

I've written down some related (admittedly rough) notes for my Julia for ML course and I'd be happy to contribute. :)

Lecture on Workflows (Markdown source)

image

Lecture on Profiling & Debugging (Markdown source)

image

@gdalle
Copy link
Contributor Author

gdalle commented Jun 28, 2023

That would be amazing @adrhill! I suggest we coordinate over on the blog repo

modernjuliaworkflows/modernjuliaworkflows.github.io#4

@zot
Copy link

zot commented Jul 4, 2023

There's calling other languages and experiments -- what about presentation packages like these?

@gdalle
Copy link
Contributor Author

gdalle commented Jul 4, 2023

I think this might be a little too task-specific for our purposes

@DanielVandH
Copy link

Could be also worth briefly mentioning other places to find more help, e.g. Discourse, Zulip, GitHub issues, Slack, and also knowing what type of help is best suited for those websites. For example, knowing how to write a good issue or provide a good MWE can go a long way even in developing packages in my experience, especially for debugging.

@gdalle
Copy link
Contributor Author

gdalle commented Jul 5, 2023

Could be also worth briefly mentioning other places to find more help, e.g. Discourse, Zulip, GitHub issues, Slack, and also knowing what type of help is best suited for those websites. For example, knowing how to write a good issue or provide a good MWE can go a long way even in developing packages in my experience, especially for debugging.

modernjuliaworkflows/modernjuliaworkflows.github.io#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests