Skip to content

thejerf/suture

Repository files navigation

Suture

Go Reference

import "github.com/thejerf/suture/v4"

Suture provides Erlang-ish supervisor trees for Go. "Supervisor trees" -> "sutree" -> "suture" -> holds your code together when it's trying to die.

If you are reading this on pkg.go.dev, you should visit the v4 docs.

It is intended to deal gracefully with the real failure cases that can occur with supervision trees (such as burning all your CPU time endlessly restarting dead services), while also making no unnecessary demands on the "service" code, and providing hooks to perform adequate logging with in a production environment.

A blog post describing the design decisions is available.

This module is fairly fully covered with godoc including an example, usage, and everything else you might expect from a README.md on GitHub. (DRY.)

v3 and before (which existed before go module support) documentation is also available.

A default slog-based logger is provided in github.com/thejerf/sutureslog. This is a separate Go module in order to avoid "infecting" the main suture/v4 with a new requirement to be on at least Go 1.21. Using this will require an additional go get github.com/thejerf/sutureslog.

Special Thanks

Special thanks to the Syncthing team, who have been fantastic about working with me to push fixes upstream of them.

Major Versions

v4 is a rewrite to make Suture function with contexts. If you are using suture for the first time, I recommend it. It also changes how logging works, to get a single function from the user that is presented with a defined set of structs, rather than requiring a number of closures from the consumer.

suture v3 is the latest version that does not feature contexts. It is still supported and getting backported fixes as of now.

Code Signing

Starting with the commit after ac7cf8591b, I will be signing this repository with the "jerf" keybase account. If you are viewing this repository through GitHub, you should see the commits as showing as "verified" in the commit view.

November 2024: My GPG key, as expected, expired. I have added a new subkey with a later expiration date, but GitHub now views all the previous commits as unsigned. Again, the nature of commit signing is that each signature is technically a signature on the entire repo, so the commit that adds this update is also a handover signature signing the repo.

Aspiration

One of the big wins the Erlang community has with their pervasive OTP support is that it makes it easy for them to distribute libraries that easily fit into the OTP paradigm. It ought to someday be considered a good idea to distribute libraries that provide some sort of supervisor tree functionality out of the box. It is possible to provide this functionality without explicitly depending on the Suture library.

Changelog

suture uses semantic versioning and go modules.

  • 4.0.6:
    • Close Issue 77.

      Issue 77 is that if a service returns one of the distinguished context errors, it was interpreted as being the result of the service's context returning that error, and so the service was terminated. However, it is easy for such errors from other contexts to end up being the return value for the service while the service's context is still alive and well.

      Rather than taking that error, the context should be examined directly for whether or not it is currently canceled.

      This is a slight behavior change, but my best guess is that this is a bugfix only, so I'm not rolling a minor release.

  • 4.0.4 and 4.0.5:
    • Apparently there is no way to have this be its own module and live in this directory. Moved sutureslog into its own repo.
  • 4.0.3:
  • 4.0.2:
    • Add the ability to specify a handler for non-string panics to format them.
    • Fixed an issue where trying to close a currently-panicked service was having problems. (This may have leaked goroutines in other ways too.)
    • Merged a PR that addresses race conditions in the test suite. (These seem to have been isolated to the test suite and not have affected the core code.)
  • 4.0.1:
    • Add a channel returned from ServeBackground that can be used to examine any error coming out of the supervisor once it is stopped.
    • Tweak up the docs to try to make it more clear suture's special error returns are checked via errors.Is when possible, addressing issue #51.
  • 4.0:
    • Switched the entire API to be context based.
    • Switched how logging works to take a single closure that will be presented with a defined set of structs, rather than a set of closures for each event.
    • Consequently, "Stop" removed from the Service interface. A wrapper for old-style code is provided.
    • Services can now return errors. Errors will be included in the log message. Two special errors control restarting behavior:
      • ErrDoNotRestart indicates the service should not be restarted, but other services should be unaffected.
      • ErrTerminateTree indicates the parent service tree should be terminated. Supervisor trees can be configured to either continue terminating upwards, or terminate themselves but not continue propagating the termination upwards.
    • UnstoppedServiceReport calling semantics modified to allow correctly retrieving reports from entire trees. (Prior to 4.0, a report was only on the supervisor it was called on.)
  • 3.0.4:
    • Fix a problem with adding services to a stopped supervisor.
  • 3.0.3:
    • Implemented request in Issue #37, creating a new method StopWithReport on supervisors that reports what services failed to stop. While a bit tricky to use, see warning about TOCTOU issues in the godoc, it can be useful at program tear-down time.
  • 3.0.2:
    • Fixed issue #35 caused by the 3.0.1 change to panic when calling .Stop on an unServe()d supervisor. It needs to correctly notice that .Stop has been called, and not start up instead, which is the contract of the Service interface.
  • 3.0.1:
    • Fixed issue #34: Calling supervisor.Stop() while something is trying to shut down a service could incorrectly report the service failed to shut down.
    • Calling ".Stop()" on an unstarted supervisor now panics. This is superior to its previous behavior, which is hanging forever. This is justified by the fact that the Supervisor can't provide its guarantees about how services are started and stopped if it is not itself started and stopped correctly. Further pushing me in this direction is that it's fairly easy to use the Supervisor correctly.
  • 3.0:
    • Added a default jitter of up to 50% on the restart intervals. While this is a backwards-compatible change from a source perspective, this does represent a non-trivial behavior change. It should generally be a good thing, but this is released as a major version as a warning.
  • 2.0.4
    • Added option PassThroughPanics, to allow panics to propagate up through the supervisor.
  • 2.0.3
    • Accepted PR #23, making the logging functions in the supervisor public.
    • Added a new Supervisor method RemoveAndWait, allowing you to make a best effort way to wait for a service to terminate.
    • Accepted PR #24, adding an optional IsCompletable interface that Services can implement that indicates they do not need to be restarted upon a normal return.
  • 2.0.2
    • Fixed issue #21. gccgo doesn't like case (<-c), with the parentheses. Of course the parens aren't doing anything useful anyhow. No behavior changes.
  • 2.0.1
    • Test code change only. Addresses the possibility that one of the tests can spuriously fail if they run in a certain order.
  • 2.0.0
    • Major version due to change to the signature of the logging methods:

      A race condition could occur when the Supervisor rendered the service name via fmt.Sprintf("%#v"), because fmt examines the entire object regardless of locks through reflection. 2.0.0 changes the supervisors to snapshot the Service's name once, when it is added, and to pass it to the logging methods.

    • Removal of use of sync/atomic due to possible brokenness in the Debian architecture.

  • 1.1.2
    • TravisCI showed that the fix for 1.1.1 induced a deadlock in Go 1.4 and before.
    • If the supervisor is terminated before a service, the service goroutine could be orphaned trying the shutdown notification to the supervisor. This should no longer occur.
  • 1.1.1
    • Per #14, the fix in 1.1.0 did not actually wait for the Supervisor to stop.
  • 1.1.0
    • Per #12, Supervisor.stop now tries to wait for its children before returning. A careful reading of the original .Stop() contract says this is the correct behavior.
  • 1.0.1
    • Fixed data race on the .state variable.
  • 1.0.0
    • Initial release.